DOCUMENT RESUME 



ED 469 378 



TM 034 528 



AUTHOR 

TITLE 

INSTITUTION 



REPORT NO 
PUB DATE 
NOTE 

AVAILABLE FROM 



PUB TYPE 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Hu, Ming-xiu; Salvucci, Sameena 

A Study of Imputation Algorithms. Working Paper Series. 
Synectics for Management Decisions, Inc., Arlington, VA. ; 
National Center for Education Statistics (ED) , Washington, 

DC. 

NCES-WP-2001-17 

2001-09-00 

120p . 

U.S. Department of Education, Office of Educational Research 
and Improvement, National Center for Education Statistics, 
1990 K Street NW, Room 9048, -Washington, DC 20006. For full 
text: htpp : //www . nces . ed. gov/pubsearch/ . 

Reports - Descriptive (141) 

EDRS Price MF01/PC05 Plus Postage. 

* Algorithms; Computer Simulation; *Data Analysis; 

Longitudinal Studies; Monte Carlo Methods; ^National Surveys; 
^Research Methodology; ^Selection 
^Imputation; ^Missing Data 



ABSTRACT 

Many imputation techniques and imputation software packages 
have been developed over the years to deal with missing data. Different 
methods may work well under different circumstances, and it is advisable to 
conduct a sensitivity analysis when choosing an imputation method for a 
particular survey. This study reviewed about 3*0 imputation methods and 5 
imputation software packages. Eleven of the most popular imputation methods 
were evaluated through a Monte Carlo simulation study. The first four 
chapters of this report are methodology discussions based on a review of the 
literature on imputation. Chapter 1, describes about 30 commonly used 
methods, including those used by the National Center for Education 
Statistics, and discusses their strengths and weaknesses. Chapter 2 focuses 
on five software packages for imputation. Nonresponse bias correction through 
imputation is addressed in chapter 3, and variance- estimation with imputed 
data and multiple imputation inference is discussed in chapter 4. Chapter 5 
reports the results of the simulation study, which evaluated 11 methods 
according to 8 evaluation criteria for 4 types of distributions, 5 types of 
missing mechanisms, and 4 types of missing rates. (Contains 31 tables and 45 
references.) (SLD) 



Reproductions supplied by EDRS are the best that can be made 
from the original document. . 



TM034528 



NATIQNAL CENTER FOR EDUCATION STATISTICS 



Working Paper Series 



oo 

cn 

o\ 

\o 

Q 

w 



A Study of Imputation Algorithms 



Working Paper No. 2001-17 



September 2001 



Contact: Ralph Lee 

Statistical Standards Program 
E-mail: ralph.lee@ed.gov 



U.S. DEPARTMENT OF EDUCATION 
Office of Educational Research and Improvement 
EDUCATIONAL RESOURCES INFORMATION 
J CENTER (ERIC) 

C/Tnis document has been reproduced as 
^received from the person or organization 
originating it. 

□ Minor changes have been made to 
improve reproduction quality. 



Points of view or opinions stated irrthis 
document do not necessarily represent 
official OERI position or policy. 



U. S. Department of Education 

Office of Educational Research and Improvement 




r . 

c 



best copy available 



NATIONAL CENTER FOR EDUCATION STATISTICS 



Working Paper Series 



The Working Paper Series was initiated to promote the sharing of the 
valuable work experience and knowledge reflected in these preliminary 
reports. These reports are viewed as works in progress, and have not 
undergone a rigorous review for consistency with NCES Statistical 
Standards prior to inclusion in the Working Paper Series. 



U. S. Department of Education 

Office of Educational Research and Improvement 



U.S. Department of Education 

Rod Paige 
Secretary 

Office of Educational Research and Improvement 

Grover J. Whitehurst 
Assistant Secretary 

National Center for Education Statistics 

Gary W. Phillips 
Acting Commissioner 



The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing, 
and reporting data related to education in the United States and other nations. It fulfills a congressional 
mandate to collect, collate, analyze, and report full and complete statistics on the condition of education in 
the United States; conduct and publish reports and specialized analyses of the meaning and significance of 
such statistics; assist state and local education agencies in improving their statistical systems; and review 
and report on education activities in foreign countries. 

NCES activities are designed to address high priority education data needs; provide consistent, reliable, 
complete, and accurate indicators of education status and trends; and report timely, useful, and high quality 
data to the U.S. Department of Education, the Congress, the states, other education policymakers, 
practitioners, data users, and the general public. 

We strive to make our products available in a variety of formats and in language that is appropriate to a 
variety of audiences. You, as our customer, are the best judge of our success in communicating 
information effectively. If you have any comments or suggestions about this or any other NCES product or 
report, we would like to hear from you. Please direct your comments to: 

National Center for Education Statistics 

Office of Educational Research and Improvement 

U.S. Department of Education 

1990 K Street NW 

Washington, DC 20006 

September 2001 



The NCES World Wide Web Home Page is 
http://nces. ed. gov 



Suggested Citation 

U.S. Department of Education. National Center for Education Statistics. A Study of Imputation 
Algorithms. Working Paper No. 2001-17, by Ming-xiu Hu and Sameena Salvucci Project Officer, Ralph 
Lee. Washington, DC: 2001. 



er|c 



4 



Foreword 



In addition to official NCES publications, NCES staff and individuals commissioned by NCES 
produce preliminary research reports that include analyses of survey results, and presentations of 
technical, methodological, and statistical evaluation issues. 

The Working Paper Series was initiated to promote the sharing of the valuable work 
experience and knowledge reflected in these preliminary reports. These reports are viewed as works in 
progress, and have not undergone a rigorous review for consistency with NCES Statistical Standards 
prior to inclusion in the Working Paper Series. 

Copies of Working Papers can be downloaded as pdf files from the NCES Electronic Catalog 
(http://nces.ed.gov/pubsearch/), or contact Sheilah Jupiter at (202) 502-7444, 
e-mail: sheilahjupiter@ed.gov, or mail: U.S. Department of Education, Office of Educational Research 
and Improvement, National Center for Education Statistics, 1990 K Street NW, Room 9048, 
Washington, DC 20006. 



Marilyn M. Seastrom 
Chief Mathematical Statistician 
Statistical Standards Program 



Ralph Lee 

Mathematical Statistician 
Statistical Standards Program 



A Study of Imputation Algorithms 



Prepared by: 



Ming-xiu Hu 
Sameena Salvucci 

Synectics for Management Decisions, Inc. 



Prepared for: 

U.S. Department of Education 
Office of Educational Research and Improvement 
National Center for Education Statistics 



September 2001 
(originally delivered 1998) 



Table of Contents 



Introduction 1 

Chapter 1 Imputation Algorithms 4 

1.1 Simple deterministic imputation method 4 

1.1.1 Deductive imputation 4 

1.1.2 Overall or cell mean imputation (also called adjusted mean imputation or 

substitution method) 4 

1.1.3 Deterministic hot deck imputation 5 

1.2 Simple random imputation methods 6 

1.2.1 Overall or cell mean imputation with random disturbance 6 

1 .2.2 Random hot deck method 6 

1 .2.3 Overall random imputation. 6 

1 .2.4 Approximate Bayesian Bootstrap (ABB) 7 

1.2.5 Bayesian Bootstrap (BB) 7 

1 .2.6 Within-class random imputation 8 

1.3 Model-based deterministic imputation methods 10 

1.3.1 Ratio imputation 10 

1.3.2 Predicted regression imputation 11 

1.3.3 EM algorithm 11 

1.3.4 Dear’s principal component method (DPC) 11 

1.3.5 General iterative principal (GIP) component method 12 

1.3.6 Singular value decomposition (S VD) method 12 

1.3.7 A comparison of ASM, EM, DPC, GIP, and SVD 13 

1.4 Model-based random imputation methods 15 

1 .4. 1 Draw imputations from predicted distributions 15 

1.4.2 Random regression imputation 15 

1.4.3 Ratio with random disturbance imputation 16 

1.4.4 Modeling non- ignorable missing mechanism 16 

1.5 Imputation methods related to Bayesian theories 17 

1.5.1 Data augmentation. 17 

1.5.2 Adjusted data augmentation 18 

1.5.3 Sequential imputation method 19 

1.6 Imputation practice across NCES surveys 21 

Chapter 2 Imputation Software Products 24 

2.1 PROC IMPUTE (See 1 .2.6 Within-class random imputation) 24 

2.2 Schafer’s imputation software (See 1.5.1 Data augmentation under Imputation 

methods related to Bayesian theories) 25 

2.3 IRMA 26 

2.4 GEIS and GES 27 

2.5 SOLAS for Missing Data Analysis 1.0 27 




vii 



7 



Chapter 3 Nonresponse Bias 28 

Chapter 4 Variance Estimation and Multiple Imputation. 33 

4.1 Add imputation variance without multiple imputation 33 

4.2 Jackknife variance estimation with imputed data 36 

4.2.1 Jackknife variance estimation with imputed data for stratified random sampling . . 36 

4.2.2 Jackknife variance estimation with fractionally weighted imputation 38 

4.3 Multiple imputation inference 41 

4.3. 1 Objectives of imputations 41 

4.3.2 Multiple imputation inference 42 

4.3.3 Current issues concerning multiple imputation 45 

Chapter 5 Simulation Study 47 

5.1 Simulation design 47 

5.1.1 Distribution 47 

5.1.2 Missing mechanism 48 

5.1.3 Missing rates 49 

5.1.4 Imputation methods 49 

5.2 Simulation results 51 

5.2.1 Bias of population mean estimates 51 

5.2.2 Bias of variance estimates with single imputation. 59 

5.2.3 Bias of variance estimates of population mean with five sets of imputations 67 

5.2.4 Coverage rates 74 

5.2.5 Confidence interval width 80 

5.2.6 Bias of quartile estimates 82 

5.2.7 Average imputation error 87 

References 93 



List of Tables 



Table 1.6.1 — Imputation methods used across NCES surveys 23 

Table 4. 1 . 1 — Contribution of each variance component to the total variance for the 

SRSWOR sampling with the mean imputation method 36 

Table 5.2.1. 1 — Bias of population mean estimates (overall) 54 

Table 5.2. 1 .2 — Bias of population mean estimates with about 10% missing values 55 

Table 5.2. 1 .3 — Bias of population mean estimates with about 20% missing values 56 

Table 5.2. 1 .4 — Bias of population mean estimates with about 30% missing values 57 

Table 5.2. 1 .5 — Bias of population mean estimates with about 40% missing values 58 

Table 5.2.2. 1 — Relative bias of variance estimates with single imputation (overall) 62 

Table 5.2.22 — Relative bias of variance estimates with single imputation with 

about 10% missing values 63 

Table 5.2.23 — Relative bias of variance estimates with single imputation with 

about 20% missing values 64 

Table 5.2.2A — Relative bias of variance estimates with single imputation with 

about 30% missing values 65 

Table 5.2.2.5 — Relative bias of variance estimates with single imputation with 

about 40% missing values 66 

Table 5.2.3. 1 — Relative bias of variance estimates with five sets of imputations 

(overall) 69 

Table 5.2.3.2 — Relative bias of variance estimates with five sets of imputations with about 

10% missing values 70 

Table 5.2.3.3 — Relative bias of variance estimates with five sets of imputations with about 

20% missing values 71 

Table 5. 2. 3.4 — Relative bias of variance estimates with five sets of imputations with about 

30% missing values 72 

Table 5.2. 3. 5 — Relative bias of variance estimates with five sets of imputations with about 

40% missing values 73 

Table 5.2.4. 1 — Coverage rates with single imputation (overall) 75 

Table 5.2.4.2 — Coverage rates with single imputation with about 10% missing values 76 

Table 5. 2.4.3 — Coverage rates with single imputation with about 20% missing values 77 

Table 5.2.4.4 — Coverage rates with single imputation with about 30% missing values 78 

Table 5.2.4.5 — Coverage rates with single imputation with about 40% missing values 79 

Table 5.2.5. 1 — Confidence interval width with single imputation (overall) 81 

Table 5.2. 6.1 — Biases of the first quartile estimates (overall) 84 

Table 5.2. 6.2 — Biases of the third quartile estimates (overall) 85 

Table 5.2. 6.3 — Biases of median estimates (overall) 86 

Table 5.2.7. 1 — Average imputation error (overall) 88 

Table 5.2. 7.2 — Average imputation error with about 10% missing values 89 

Table 5.2. 7.3 — Average imputation error with about 20% missing values 90 

Table 5.2. 7.4 — Average imputation error with about 30% missing values 91 

Table 5.2. 7. 5 — Average imputation error with about 40% missing values 92 




9 



IX 



Introduction 



No matter how well a survey questionnaire is designed and no matter how efficient a data 
collection procedure is employed, missing values almost always exist in survey data. There are 
two main reasons for missing values, survey (or unit) nonresponse and item nonresponse. 
Examples of survey nonresponse include when sampled subjects are unable to be contacted; 
when sampled subjects refuse to respond altogether; when sampled subjects are found to be 
out-of-scope. Examples of item nonresponse include when sampled subjects refuse to answer 
certain questions; when sampled subjects are unable to answer certain questions; when 
interviewers fail to ask the question or fail to record the answer, when an inconsistent response 
is deleted in data editing. 

One of the most common methods to compensate for survey nonresponse is through weighting 
adjustments; that is, to reassign the weights of the nonrespondents to the respondents. 

However, there are some problems with the use of weighting adjustments for dealing with unit 
nonresponse (Rubin 1996): 

• Even in the simplest case of unit nonresponse, where the shared data base of respondents is 
fully observed (i.e, there is no item nonresponse), many ultimate users’ complete-data 
analyses do not allow for sampling weights. 

• Even with complete- data analyses that can deal with sampling weights, the construction of 
intervals and p- values that validly account for the fact that nonresponse adjustments in the 
weights are estimated from data are not immediate from complete-data analyses. 

• With general patterns of nonresponse, special analysis methods need to be developed and 
special software needs to be written. 

• Weighting adjustments are focused on unbiased estimation and are essentially blind to 
efficiency concerns. 

Given these problems with using weighting adjustments, imputation has become one of the 
most popular tools used to solve missing value problems in survey data analyses. The use of 
imputation to create complete data can have the following advantages: 

• Data collectors usually have more inside knowledge about the reasons for the missing 
values. This inside knowledge can be used in imputation; 

• Missing values complicate the data structure, so that more sophisticated statistical tools are 
required to conduct analyses. Imputation may ease this difficulty; 

• Imputation can prevent the loss of information due to deletion of incomplete records if the 
statistical methods used (e.g., regression) require complete records; 




10 



1 



• Imputation can reduce nonresponse bias in some situations; 

• Pairwise correlation matrices computed from incomplete data may not be positive definite. 
Imputation can avoid this problem. 

The basic objective of imputation is to allow ultimate data users to apply their existing analysis 
tools to any data set with missing values using the same command structure and output 
standards as if there were no missing data. Most imputation methods such as “complete- case 
analysis,” “available-case analysis,” and “fill-in with means”, satisfy this basic objective and so 
have a certain appeal. However, it is certainly not enough to just achieve this basic objective. 
Another desirable objective is statistical validity: assuming that the ultimate user’s complete-data 
analysis is statistically valid for a scientific estimand, the answer that results from applying the 
same analysis method to an incomplete- data remains statistically valid for the same scientific 
estimand assuming the truth of the database constructor’s posited model for missing data. This 
goal can be achieved through some imputation methods, but cannot be achieved through others. 

It is probably a popular misunderstanding that the goal of imputation is to predict individual 
missing values. This is popular because of hot deck imputation methods which attempt to find 
the best match (donor) for each missing case. A better estimate for each missing value not 
necessarily leads to a better overall estimate for the parameters of interest. Here is a 
counterexample given by Rubin (1996): suppose we have a coin that, in truth, is biased .6 heads 
and .4 tails. This known truth is model A, whereas model B asserts that the coin has two heads. 
Using model A for creating imputations (i.e., future predictions) yields a hit rate (agreements 
between predictions and outcomes) of .6 x .6 + .4 x .4 = .52, whereas using model B for 
predictions yields a hit rate of .6. This does not mean that model B is better than model A for 
handling missing values. Filling in missing values using model B yields the invalid statistical 
inference that in the future all coin tosses will be heads, clearly inconsistent for the estimand Q = 
fraction of tosses that are heads, whereas using model A yields consistent estimates for all such 
scientific estimands. 

Many imputation techniques and imputation software packages have been developed over the 
years. Different methods may work well under different circumstances. It is advisable to 
conduct a sensitivity analysis when choosing an imputation method for a particular survey. 

This task reviewed about thirty imputation methods and five imputation software packages. 
Eleven of the most popular imputation methods were evaluated through a Monte Carlo 
simulation study. 

This report consists of five chapters. The first four chapters are on methodology discussions 
based on our review of numerous papers and books. Chapter 1 describes about thirty most 
commonly used imputation methods with brief discussions of their strengths and weaknesses. 
The imputation methods used across the national surveys conducted by the National Center for 
Education Statistics (NCES) are also summarized in this chapter. Chapter 2 discusses five 
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imputation software packages. Nonresponse bias correction via imputation is addressed in 
chapter 3. Variance estimation with imputed data and multiple imputation inference is discussed 
in chapter 4. Chapter 5 reports the results of the simulation study, which evaluates 1 1 imputation 
methods according to eight evaluation criteria for four types of distributions, five types of missing 
mechanisms and four types of missing rates. 
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Chapter 1 Imputation Algorithms 



Imputation methods are generally classified into two categories: random (also called 
stochastic) and deterministic. A deterministic imputation method determines one and only one 
possible value for imputing each missing case. Once the imputation scheme is set up, the 
imputation result is unique. On the other hand, a random imputation method draws imputation 
values randomly either from the observed data or from the predicted distribution. Multiple sets 
of imputations can be created to capture the uncertainty between imputations via any random 
imputation method. Generally, a random imputation method adds more variability to the 
statistics computed from an imputed data set than a deterministic imputation method. 

However, in this chapter, we will discuss imputation techniques under five categories: 

• Simple deterministic imputation 

• Simple random imputation 

• Model-based deterministic imputation 

• Model-based random imputation 

• Bayesian- related imputation methods 

It is easy to see that these five categories are not mutually exclusive; we are using them mainly 
for convenience of discussion. 

1.1 Simple deterministic imputation method 

1.1.1 Deductive imputation 

This method deduces missing values from available information, such as similar items in previous 
surveys, related items in current surveys, etc. To apply this method, the user needs to find some 
deterministic relationship between the missing item and items from other resources. Cold deck is 
one deductive imputation method that uses information from previous similar surveys. Generally, 
it is impossible to find enough information to impute all missing items in a survey using deductive 
imputation, but this method can be used to impute some of the missing variables. Whenever 
possible, deductive imputation should be used before any other imputation method because it 
provides accurate or approximately accurate imputations for missing cases. However, the 
performance of a deductive imputation method completely depends on the available sources. 

1.1.2 Overall or cell mean imputation (also called adjusted mean imputation or substitution 
method) 

This is the simplest but least attractive imputation method. Overall mean imputation uses the 
overall sample mean to replace all missing values in the data set. This method can provide 
unbiased estimates for the population means or totals only if the missing values are missing 
completely at random (MCAR). Cell mean imputation first uses some auxiliary variables to form 
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imputation cells, and then replaces missing values in each cell with its sample mean. The method 
can give unbiased estimates for the population mean or total if the missing values only depend on 
the auxiliary variables which are used to construct the imputation cells. However, the distribution 
of the data will be distorted substantially and the concentration of all imputed values at the cell 
means creates spikes in the distribution. Therefore, quartile estimates will be biased, and the 
variances materially underestimated. 



If the mean imputation method is used, it is advisable to calculate the variance- covariance 
estimates using a denominator of n-m-l instead of n-1, where n is the sample size and m is the 
number of cases missing one or both variables for pairwise covariance estimate calculation. We 
will call this strategy the adjusted mean imputation (or substitution) method in this report. 



Cohen (1996) suggested another way to adjust variance estimates by imputing more diversified 
values for the missing cases. For example, instead of imputing the mean for all the missing 



values, Cohen suggested imputing half of the missing values with 



ln + r-1 



D. and the 



_ n+r—1 _ 

other half with y r - J — D r , where r is the number of response values, y r is the mean of 

observed values, and D] = — ^ (y t — y r ) 2 . This type of adjustment will retain the first and 



second moments as observed. 



1.1.3 Deterministic hot deck imputation 

Hot deck imputation is one of the most popular imputation methods because it is simple and 
intuitively makes sense to many practitioners who do not have a strong statistical background. 
Hot deck imputation does not employ any explicit statistical model. Its major disadvantage is 
that it can not recover typical values for objects with certain characteristics if no such subject 
responds to a survey. Hot deck imputation employs many methods. The following are the most 
popular deterministic hot deck imputation methods. 

(1) Sequential nearest neighbor hot deck imputation. This method is also called 
traditional hot deck imputation. The first step in this method is to use some auxiliary 
variables to specify imputation classes. Second, within each imputation class, a single value 
such as the class mean or some pre- specified value is assigned as a starting point. Then the 
records in the data file are treated sequentially. If a record has a response for the target 
variable, that value replaces the previously stored value for its imputation class. If a record 
has a missing value for the target variable, it is assigned the value currently stored for its 
imputation class. 

A major attraction of this method is its computing economy, since all imputations are made 
in a single pass through the data file. A disadvantage is that this method may easily give rise 
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to multiple use of donors, a feature which leads to a loss of precision for survey estimators 
(Kalton and Kasprzyk 1982). 

(2) Multivariate matching. In this method, donors and donees are matched on several 
predetermined auxiliary variables. For each missing case in each matched class, the nearest 
donor is chosen for imputation. If no donor is found in a matched class, the class is 
combined with other classes to obtain donors. 

While this method is not convenient to implement using computer programs, an 
approximately equivalent imputation algorithm may be used to replace it. The algorithm first 
sorts the data file with the same auxiliary variables, and then imputes the nearest response 
value for each missing case. This alternative method is very easy to implement. The donor 
and donee will match on all auxiliary variables if such donors are available. Otherwise, it 
will automatically find a donor matched on some of the auxiliary variables, which is 
equivalent to collapsing the matched classes. 

(3) Distance function matching. This method imputes the nearest response value for 
each missing case according to some univariate distance function of auxiliary variables, 
such as the norm in the multi- dimensional Euclidean space, Mahalanobis distance, the 
difference between the predicted values from a regression model, etc. 

1.2 Simple random imputation methods 

1 .2. 1 Overall or cell mean imputation with random disturbance 

To overcome the underestimated variance typical of the mean imputation method (see section 
1.1.2), we may add a small disturbance drawn from a distribution with a mean zero and 
variance-covariance matrix equal to the observed variance-covariance matrix. Most often a 
normal distribution is used to draw the random disturbance. 

1.2.2 Random hot deck method 

Random hot deck imputation is one of the most popular methods in practice. It generally 
consists of three steps: (1) determine auxiliary variables on which donors and donees will match; 
(2) randomly draw imputations from observed data according to the observed frequency 
(weighted or unweighted) within each matched class; (3) if a matched class does not have any 
observed value, combine that class with other classes and perform imputation based on the 
combined imputation classes. 

1.2.3 Overall random imputation 

Overall random imputation generally refers to drawing imputation values randomly from 
observed data using different sampling schemes. The most frequently used scheme is resampling 




6 



15 



with or without replacement. It is one of the easiest methods to implement, because it does not 
use any auxiliary variables and will not be able to reduce nonresponse biases. 

1 .2.4 Approximate Bayesian Bootstrap (ABB) 

The ABB method first randomly draws r values with replacement from the r observed values 
Y x , .. . , Y r to create Y* bs , and then randomly draws m values with replacement from Y* bs as 
imputed values for the m missing values in the target variable Y. The ABB method draws 
imputations from a resample of the observed data instead of drawing directly from the observed 
data. This extra step introduces additional variation, which makes the ABB method 
approximately “propef ’ for multiple imputation according to Rubin’s theory (1987). (This 
method is called approximately Bayesian Bootstrap because it is approximately equivalent to the 
Bayesian Bootstrap described below.) 

Similarly to the overall random imputation method, when ABB imputation is performed for the 
overall sample, it will not be able to reduce nonresponse biases because it does not use any 
auxiliary information. ABB imputation may work well for within- class imputations if the missing 
mechanism only depends on the variables used to construct the imputation classes. 

1 .2.5 Bayesian Bootstrap (BB) 

BB imputation consists of two steps: (1) draw r-1 uniform random numbers between 0 and 1, 
and let their ordered values be a , , . . . , a r _ , ; also let a<)=0 and a x = 1 , where r is the number of 
observed values; (2) draw each of the m missing values from Y x ,...,Y r with probabilities 
(a, -a 0 ),(a 2 - a, ),..., (1 -a r _,) ; that is, independently m times, draw a uniform random 
number u, and impute fj if a,_, < u < a, (i= 1,2, . . . , r). 

Rubin (1981) showed that the Bayesian Bootstrap is equivalent to assuming that the prior 
distribution of 7t is the (improper) distribution 

whereTt = (n x ,...,n K ) is the vector of probabilities Pr^ = d k ) = 7C k ,^K k =1 and 
d x ,...,d K are all possible distinct values in Y l ,...,Y r . The posterior distribution of n is 

where r k is the number of y x that equals d k , and * =1 r k -r. The posterior distribution is a ( k - 
1) dimensional Dirichlet distribution. The BB method first draws a value 71 * of n from this 
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posterior distribution, then independently draw imputations for missing values from among 
d l ,...,d K using the probabilities in n *. 

The difference between ABB and BB is that the underlying parameter of the data, which gives 
the probabilities of each component in y o bs, is being drawn from a scaled multinomial with the 
ABB rather than from a Dirichlet distribution. Both distributions have the same means and 
correlations, but the variances for the ABB method are (1+1 lr) times the variances for the BB 
method (Rubin 1981). 

1 .2.6 Within-class random imputation 

Random hot deck is a specific within-class random imputation method. Two factors may vary 
from one method to another in the within-class random imputation methods: how to form the 
imputation classes and how to draw imputations within each class. The three most commonly 
used methods for constructing imputation classes are as follows: 

(i) Imputation classes are formed using multiple auxiliary variables. Cases matching on 
selected auxiliary variables are classified into the same imputation class. The disadvantage 
of this method is that, as the number of auxiliary variables increase, the number of 
imputation classes can quickly become enormous. This may limit the use of auxiliary 
information in the imputation. 

(ii) Imputation classes are constructed using regression predicted values from a 
multivariate regression model. Cases with close predicted values are classified into the 
same imputation class. The use of auxiliary variables is unlimited (at least theoretically so) 
with this classification method. This method was used by imputation software PROC 
IMPUTE (version 2.0, Wise & McLaughlin, 1992). 

(iii) Imputation classes are constructed using the propensity score method (Rosenbaum 
and Rubin 1983, 1984). In brief, the idea is to find a single valued function b(X) of the 
covariates X, with the property that the desirable properties of classification on X are 
inherited by classifying on b(X). As shown by Rosenbaum and Rubin, the best such score 
is the function e(X), the propensity given X, defined as the conditional probability of 
observing the target variables Y given X. Then, the property that the missing mechanism is 
independent of Y given X, carries over to independence given the propensity score e(X), so 
that the imputation is unbiased. The propensity scores can be estimated through logistic 
regression. 

ABB and BB (described in sections 1.2.4 and 1.2.5) have already been shown to draw 
imputations within each imputation class. The following methods also do so (Gimotty & Brown 
1990). 
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(i) Resampling using simple random sampling with replacement : Within k-th 
imputation class, the imputed value is selected randomly with replacement from a 
multinomial distribution with parameter vector p ^ , the observed proportions of all possible 

categories. Then, given the observed data, the conditional expected value and conditional 
variance of />* , the proportion estimates of all possible categories based on the imputed 

values only, are 

E [P\ \data] = p , Cov[ p | data] = — (< diag(p k £*)“£*£*)> 

m k 



where my, is the number of missing values in k-th imputation class. 

(ii) Resampling using simple random sampling without replacement : Within k-th 
imputation class, each observed value is used only once as an imputed value. However, 
when my>ry, all observed values are used as many times as possible and then a simple 
random sample is taken from the observed values without replacement and those values 
are used as imputed values for the remainder of the nonrespondents. Here, we only 
consider my<ry. In this case, the distribution of the frequencies of the imputed values in 
each category is hypergeometric. The conditional expectation given the data is the same as 
in (i), whereas the conditional variance-covariance matrix is given by 



Covlppam] £, r > -P,E,) ■ 



(iif) Randomized strategy using maximum likelihood estimates'. Let the proportion 
estimate based on the observed data be p = (p ]k p jk , . . . , p, k ) T , then the estimated 

frequency is m kP k = { m kP\k ’■■■■> m kPjk’-'< m kPik) T - Then category y is assigned as the 

i / 

imputed value to [m k p jk ] missing cases, which leaves c k = ^ m k p jk - [m k p jk ] = ^ c jk 

7=1 7=1 

missing values un- imputed in the k-th imputation class, where [m k p jk ] is the largest 
integer which is smaller than m k p Jk . The imputed values for these remaining missing values 
are independendy selected from multinomial distribution with parameter vector c' where 
c ]k = c jk I c k ■ Th e conditional expectation of the imputed proportion is the same as 
before, but the conditional variance- covariance matrix is given by 



Cov[p {data ] = \{diag(c k (c* ) T ) - c\ (c' k ) T ) . 

-* m k 
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Method (i) is strictly stochastic and acts to increase the variability of statistics computed from an 
imputed data set compared to a deterministic method. Both method (ii) and method (iii) may be 
deterministic. Method (ii) is deterministic when the number of observations equals the number of 
missing values. Method (iii) is deterministic when m k p ]k are integers for each imputation class. 

However, in general, method (ii) adds more variability than method (iii) and method (i) adds 
more variability than method (ii). However, all of them add less variability than the ABB and the 
BB imputation methods. 

1.3 Model-based deterministic imputation methods 

Generally, “correctly” modeling missing data must be the data constructor’s responsibility 
because he/she typically knows more about reasons for nonresponse and has access to 
confidential and detailed information not released for public use. Model- based approaches will 
produce more accurate imputations than randomization-based approaches if the model 
assumptions are satisfied. But the difficulty with model-based approaches is that those 
assumptions are usually unverifiable in practice and therefore it may not be easy to choose an 
appropriate model-based imputation approach for a typical survey. A good model-based 
approach would work well for a wide range of underlying data distributions and missing 
mechanisms. 

1.3.1 Ratio imputation 

Suppose that an auxiliary variable x closely related to the target variable y is observed on all 
sample units. Ratio imputation uses y* hi = zr~x hi as imputed values for the i-th nonrespondent 

X rh 

in h-th imputation class. This method can be motivated by the fact thaty^ is the best predictor 
under the following “ratio” superpopulation model: 

E{y hi ) = ft* , V(y M ) = a 2 h x hi , Cov(y hi ,y hj ) = 0, 

provided that the model holds for both the respondents and nonrespondents. 

The ratio imputation method may provide very accurate imputations if the missingness ofy 
mainly depends on a highly correlated auxiliary variable x. But this is a very restrictive 
assumption. In practice, missing values are more likely to depend on several auxiliary variables. 
Since ratio imputation can use only one auxiliary variable, it is not fully efficient in many 
situations. One way around this is to use some auxiliary variables as classification variables, but 
this is still not a satisfactory solution to the limitation on the efficient use of auxiliary variables. As 
the number of classification variables increase, the number of imputation classes quickly 
becomes enormous and then some imputation classes may not have sufficient samples to obtain 
fairly accurate ratio estimates. 
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1.3.2 Predicted regression imputation 



This method uses the predicted values from a regression model as imputations for all missing 
cases. The predicted value j>, is the best predictor of the i-th unobserved valuer under the 
following super-population model: 



E(y i ) = a + p'x , , V(y i ) = <7 2 , Cov(y i ,y ; ) = 0 

provided that the model holds for both the respondents and the nonrespondents. Predicted 
regression imputation may also be performed within each imputation class. The disadvantage of 
this method is the shrinkage to the mean phenomenon. 

1.3.3 EM algorithm 

The EM algorithm (Dempster, Laird, and Rubin 1977) consists of two steps: the E-step 
calculates the expectation of the complete data sufficient statistics given the observed data and 
current parameter estimates, and the M-step updates the parameter estimates through the 
maximum likelihood approach based on the ament values of the complete sufficient statistics. 
The algorithm then proceeds in an iterative manner until the difference between the last two 
consecutive parameter estimates converges to a specified criterion. The final E-step calculates 
the expectation of each missing value given the final parameter estimates and the observed data; 
this will be used as the imputation value. 

Although the EM algorithm can be used to impute each individual missing value, it is more often 
used to directly obtain estimates for population parameters. Assuming a normal distribution for 
the data, both the expectations of the sufficient statistics in the E-step and the maximum 
likelihood estimates of the parameters in the M-step are easy to derive. But it may not be easy 
to do so with other distributions. Convergence may be slow and not guaranteed with the EM 
algorithm especially with sparse data. If each M-step also requires an iterative process to obtain 
the maximum likelihood estimates, the convergence process will further be slowed down. This 
method also suffers the shrinkage to the mean phenomenon. The advantage of the EM algorithm 
is its stable convergence; that is, iterations always increase the likelihood. 

1 .3.4 Dear’s principal component method (DPC) 

The imputation strategy using the principal component method consists of three steps: 

(Dl) Let R= {ry} be an nxp missingness indicator matrix for variables X\.. .X p with n 
observations, i.e., ry =0 or 1 according to whether xy is missing or observed. Use all 
available cases to calculate the sample mean and variance for each variable, and then 
standardize X to Z. Next, use the case- wise-deletion method (delete the whole case if 
one variable has a missing value on that case) to obtain the correlation matrix, S. 



(D2) Calculate the largest eigenvalue of S, and its associated eigenvector 



Vi =(Vn>-,V lp )- 

(D3) Let the first principal component for the ith case be 



P 



Ti =IXv;y> 



so that the points on the first principal component line that are closest to the i-th case 
replace the missing variables: 



One desirable property of principal component analysis is that it does not require any 
distributional assumptions for its use. However, since the case- wise- deletion method is used to 
obtain the correlation matrix S, DPC works poorly for data sets with only a few complete 
cases. 

1.3.5 General iterative principal (GIP) component method 

To avoid the problems mentioned above and make DPC a general purpose method, the 
following refinements have been introduced. 

(Gl) Use all-available-data method to calculate S. If S is non- positive definite, modify it 
with the algorithm provided by Huseby, Schwertman, and Allen (1980); or replace all 
missing values by the mean and use n-m - 1 instead of n - 1 as the denominator in the 
variance- covariance calculations to obtain S. 

(G2) Perform D2 and D3 with S obtained from G\. 

(G3) Recalculate S from the imputed data matrix and repeat G2. 

(G4) Cycle iteratively through G3 and G2 until successive imputed values do not change 
materially. 

1.3.6 Singular value decomposition (S VD) method 

Singular value decomposition (SVD) can be used in a simple way to impute data to missing 
values (Krzanowski 1988). The method is easy to compute and a description of the steps for 
one missing value Jtjj inX followed: 




Repeat (D3) for all cases with missing variables and convert Z* back to X*. 
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(51) Omit the ith case (row) from X and calculate the SVD of the remaining (n-l)x p data 
matrix, denoted by X~‘ = UDV' with U = {u sl } , V = {v^,} and 

D =diag{d l ,- --,d p ), where U and V are orthonormal matrices (i.e., 

UV =UU' = I). 

(52) Omit the jth variable (column) from A and calculate the SVD of the remaining nx (p- 
1) data matrix, denoted by X_ } = UDV with U = {w v , } , V = {v w } and 

D = diag{d l ,---,d p _ i }. 

(53) Impute for (i, j)th missing case with 



p - ' 



x* v 

(=i 




In the case where there is more than one missing value, an iterative scheme can be conducted as 
follows: start with any initial imputed values such as the mean, and update each initial imputed 
value in turn using S3. The process is then iterated until stability is achieved in the imputed 
values. 

1.3.7 A comparison of ASM, EM, DPC, GIP, and SVD 

Bello (1993) conducted a simulation study to compare the five deterministic imputation 
methods: the adjusted mean substitution (AMS), EM algorithm, DPC, GIP, and SVD. In the 
study, Bello’s two simulation populations are multivariate normal N p (p, X) and t-distribution 

with 4 degrees of freedom, T p (4,p, Z) , where fi=0 and Z=VAV'. Visa, randomly generated 
orthogonal matrix and A=diag{A,i, ..., Ap} , \ = wv'" 1 + 0.1 as used by Bendel (1978), where 

f(c - 0.1 />)(l — v)/ (l-v p ) 0 < v< 1 

w = i , 

[ dp- 0.1 v - \ 

and c is the trace of X. Evidently, values of v represent a continuum such that the 
interdependence among the variables increases as v decreases from 1 to 0. The variables are 
independent when v=l . 

Other varying factors are sample size («), dimensionality ip), interdependence among the 
variables (v), and missing rate (y). Missing data are created randomly, which actually results in 
the ideal missing mechanism, missing completely at random. The number of Monte Carlo 
simulations for each combination of rt,p, v, and y was fixed at 100. The mean square error 
(Euclidean norm) of the estimators of Z over the 100 simulations are used as the main 
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comparison criterion. For the estimator of the mean vector, all the imputation methods are 
similar since the data are missing completely at random. 

The primary findings of Bello’s study are as followed: 

For multivariate normal distributions: 

• When the variables are nearly independent (v=0.7) and p< 10, the AMS outperforms 
the other four regression- like imputation methods. EM algorithm is the second best, 
followed by DPC, SVD, and GIP. This is not surprising since the mean imputations are 
obtained under the pretext that the variables are uncorrelated. 

• For p>2, as v<0.3, the regression- like imputation techniques show appreciable 
superiority over the adjusted mean imputation method. 

• When the missing rate r> 0. 1 0 and n becomes large (> 1 00), EM is, on the average, the 
best technique followed by GIP, SVD, ASM, and DPC. 

For multivariate t- distributions: 

• Although the principal component and singular value decomposition method can be 
presumed to be distributional- assumption- free, this does not mean that DPC, GIP, and 
SVD are robust to structures in data. 

• When v=0.7, the imputation methods behave similarly to their normal counterparts. 

• EM — which depends on a normality assumption — is running neck- and- neck with the 
distributional- free techniques — DPC, GIP, and SVD. When n is sufficiently large (200) 
and the variables are strongly dependent (v<0.3) with moderate dimensionality (p= 5), 
EM outperforms the other imputation techniques. On the other hand, when p=2 and 
v=0.3, for any n value, GIP is the most efficient method. 

• When p increases, n increases, and v decreases, the regression- like methods become 
better and better than ASM. 

• There is insufficient evidence to discredit the use of EM when the data are markedly 
deviate from normality especially Mien p> 2 and reasonably moderate-to-high 
interdependence exists among the variables. This remark implicitly suggests that 
whatever is known to affect EM — for example, outliers — may also affect other 
imputation techniques as well. 

Regarding the computer-time used by these imputation techniques, ASM and DPC are non- 
iterative techniques and no special computer-time is required. Among the three iterative 
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methods, the convergence rate of EM was observed to be the slowest, followed by SVD, 
and then GIP. 

Although the performances of the methods are compared based on the artificial assumption, 
MCAR, these results can still be used as references. 

1.4 Model-based random imputation methods 

1.4.1 Draw imputations from predicted distributions 

If some information about the type of data distribution is available, imputations can be drawn 
from a predicted distribution. This method assumes a distribution for the data and uses the 
observed data to estimate the unknown parameters in the assumed distribution. If the 
distribution assumption is approximately true, this method will give much better imputations than 
any method which draws imputations from observed data. Rubin’s example (Rubin 1978) can 
illustrate this. Suppose a sample of 1000 units with 500 respondents and 500 nonrespondents. 
The 500 respondents look like a half-normal. If we learn from other sources that the population 
is approximately normal, then we can use the data of the 500 respondents to obtain the mean 
and variance estimates, and draw imputations from the normal distribution with the estimated 
mean and variance. This makes it possible to recover the other half of the normal distribution. 
Although this is an extremely artificial example, it is possible in real applications that data of 
some specific categories are totally or mostly missing. In those cases, methods that draw 
imputations from observed data will not be able to recover missing values for those categories, 
while drawing imputations from a predicted distribution may be able to recover them. The 
disadvantage of this method is that it requires information in order to develop an appropriate 
distribution assumption. 

1.4.2 Random regression imputation 

As stated in section 1.3.2, predicted regression imputation suffers from shrinkage to the mean 
phenomenon. Small random disturbances can be added to the predicted values as imputations 
to increase variability. The small random disturbance may be drawn using the following methods: 

(1 ) draw a random disturbance from a distribution such as vV(0, cr) with mean 0 and 
variance 6 obtained from observed data; 

(2) draw a random disturbance from respondents’ residuals of the regression model; 

(3) draw a random disturbance from residuals of those respondents which have 

similar values on some selected auxiliary variables to protect against non-linearity and non- 
additivity in regression models. 
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1 .4.3 Ratio with random disturbance imputation 



We can add a small random disturbance to the imputed values obtained from a ratio imputation 
model (see section 1.3.1) as was done above to the predicted regression imputation. The 
random disturbance can be drawn using three methods parallel to those described above. 

1.4.4 Modeling non- ignorable missing mechanism 

Most imputation methods model the target variable with missing values but not the missing 
indicator variable. These methods explicitly or implicitly assume that the missing values occur at 
random given the conditional auxiliary variables. Greenless, Reece, and Zieschang (1982) try to 
model both the target variable and its missing indicator variable for a non- ignorable missing 
mechanism which allows the missingness to depend on the target variable itself. 

Let 7 be the target variable with missing values, Xbe the auxiliary variables for predicting Y, R 
be the response indicator, and Z be the auxiliary variables for predicting R. X and Z may 
overlap. Then the imputation model employed is: 

Y,=X t P+e i e, ~N(0,c 2 ) 

P(R, = l|r, ,Z , ) = 1 / [1 + exp(-a -yY t - 8Z , )] . 

The later equation indicates that the response probability of Y depends on Y itself. Then the 
likelihood for i-th respondent is given by 

i i 

' 1 + exp(-a - yY l - SZ t ) ( x ^ ® J 



and the likelihood for i-th nonrespondent is given by 

\ 

iY. 

Tbs, maximum likelihood estimates for a, fi,y, 8 , and<T are obtained by maximizing the whole 

n 

sample likelihood L = ]~[ L . The solution to this maximizing problem may be found through 

i=i 

the generalized Gauss-Newton algorithm. 

We may impute the missing values using the mean of the distribution of Y conditional on 
nonresponse, the values of X and Z, and the parameter estimates a, (3, y, 8, and 8 . This mean 
can be calculated in a straightforward way using numerical integration: 









+ exp(-a -yY— 8Z t ) 



1 

— *0 

a \ a 
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£(4y„z ( ,4 =°)= 



f + oo 



1 — 



1 + exp(-a -yY- 8Z t ) 






'y-xjp 






dY 



c 



i— 



1 + exp(-a —yY— <5Z ( ) 






/ * \ 

f Y-xjr 






dY 



Alternatively, to avoid the shrinkage to the mean phenomenon, we may use the following 
imputation scheme. 

(1) Draw £j from jV( 0,1) and a uniform random number T| from C/[0, 1], 

(2) Calculate Y i =X l p + 6e l and Pr (P, = o|tz,) = l-- ' -r— * r - 

1 + exp(-a - yY t - 5Z t ) 

I A A 

(3) If Pr(P, = OB' , Z,. ) > T] , impute ^ for the i-th missing case; otherwise re-do 
(1) and (2). 

If the model of the missing indicator variable is approximately satisfied, this method should give 
better imputations than usual imputation methods. However, that is an unverifiable assumption in 
real applications and the extra model makes it less robust for general imputation purposes. This 
method may not be recommended if there is no strong evidence to show that the missing 
mechanism is confounded, that is, the missingness of Y depends on Y itself. 

1.5 Imputation methods related to Bayesian theories 

1.5.1 Data augmentation 

This Bayesian iterative method was proposed by Tanner and Wong (1987). It assumes two 
distributions: the distribution of the data and the prior distribution of the parameters. Similar to 
the EM algorithm, it consists of two steps: (1) /-step (imputation step) draws imputations for the 
missing values from the predicted distribution of the data, using current parameter estimates; (2) 
P-step (parameter estimation step) draws parameter estimates from their posterior distribution, 
using both the observed and imputed data. To start this iterative process, we may use the EM 
algorithm to obtain initial parameter estimates for the first /-step. 

Schafer’s software (Schafer 1997) implements this method using models for continuous data, 
categorical data, and mixed continuous and categorical data. 

• For continuous data, this software assumes a multivariate normal distribution for the data, 
and a normal prior for the mean parameters and a normal- inverted Wishart for the variance- 
covariance parameters. 
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• For categorical data, this software assumes a multinomial distribution for the data and a 
Dirichlet prior distribution for the parameters. In cases where the number of parameters 
becomes enormous, the software imposes loglinear constraints (Bishop, Fienberg, and 
Holland 1975) on the parameters. 

• For mixed continuous and categorical data, the software employs a general location model 
(Olkin & Tate 1961). It assumes multinomial distribution for the categories defined by the 
categorical variables. Within each category, the continuous variables are assumed to have 
multivariate normal distribution. The prior for the parameters in the multinomial distribution is 
Direchlet and that for the parameters in the multivariate normal distribution is Jeffrey’s non- 
informative prior. To reduce the parameters, a loglinear constraint can be imposed on the 
multinomial parameters and a linear constraint on the mean parameters of the multivariate 
normal distribution. 

The data augmentation procedure approximates the actual posterior distribution of the 
parameter vector by a mixture of complete data posteriors. Their method of constructing the 
complete data sets is closely related to the Gibbs sampler (Geman and Geman 1984). This 
method efficiently uses relationships among variables for constructing imputations. It generally 
gives both good point estimates and variance estimates if the distribution assumptions on the 
data are approximately satisfied. Under simple random sampling, the data augmentation method 
provides “proper” multiple imputations in the sense of Rubin (1987). The disadvantage of the 
data augmentation method is that it requires iterations and, similar to the EM algorithm, 
convergence can be slow. 

1.5.2 Adjusted data augmentation 

If the distribution assumption in the data augmentation method is in question, it is desirable to let 
the observed data T 0 bs influence the shape of the distribution of values imputed for T m j S . Rubin 
and Schenker (1986) adjusted the normal model implemented in Schafer’s software as follows. 
First, the parameters fl* and a 2 are obtained in the same way as in the data augmentation 
method. Second, the components of m - dimensional vector X = (X, , . . . , X m ) are drawn with 
replacement from the observed data 7 0 bs- Under repeated draws from T 0 bs, the standardized 
variable 



z i = ( X i - y r )> V( r_ 1 K 2 ~ r 

has expected value 0 and variance 1 . Finally, the m missing values T m j S are imputed using 
fi* + cr*Z ( ., i=l, 2, ..., m. 
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1.5.3 Sequential imputation method 



Kong, Liu and Wong (1994) propose a sequential imputation procedure that involves imputing 
the missing data sequentially. According to the authors, in many applications the sequential 
imputation method can work well without the need for iterations. 

To describe the method, let 0 be the parameter vector of interest and 7 be the complete data. 
Suppose the complete- data posterior distribution p(6 / Y) is simple. Suppose the real data Y can 
be decomposed into 



where Y n and Y mt (t=\,2, ...,«) are the response and nonresponse variables in the t-th 
observation. The missing variables may be different for different observations. The main goal is 
to find the posterior distribution p ( 9 | YJ: 



If we can draw M independent copies of 7 m ’s from the conditional distribution p(Y m | YJ, then 



Y( j) = (Y r , Y m (/)) and y m (j) is the j-th imputations for the missing part Y m . However, drawing 
imputations directly from conditional distribution p(Y m \ YJ is usually difficult. The Gibbs sampler 
or the data augmentation procedure do this approximately by iterations. 

The sequential imputation method achieves something similar by imputing the y mt ’s sequentially 
and using importance sampling weights to avoid iterations. The sequential imputation starts by 
drawing from p(Y ml \ Y ;] ) and computing W\=p(Y,\). Then for t= 2, the following two 

steps are done sequentially, 






p((\y r ) = J Mr) p(Y m \y, )dY. = E,_ |r _ [p(6|y)] . 




( 1 ) Draw y„; from the conditional distribution p(Y mt \Y r] , Y^ ,. ) ; 

(2) Compute the predictive probabilities p(Y rl \Y n ,Y* x ,..., Y r ) and 





( 1 . 1 ) 
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Let w= w n , so that 







1=2 



Both steps are required to be computationally simple, which is often the case if the predictive 
distributions p(Y\) and p(Y,\Y x ,...,Y,_ x ) are simple. This is the key to the feasibility of sequential 
imputation. 

We can independently repeat the above process M times to draw M sets of imputations and 
weights, denoted as Y* ( j ) and w(j) respectively (/'= 1 , 2, . . . , M). Then the posterior 
distribution p(B \ YJ is estimated by 

Yf,wu)f{e\r r xuj), 0-2) 

W H 

which is easy to compute under the assumption that the complete- data posterior is simple, 
where W = . 



To understand why (1 .2) is the appropriate approximation, we note that each independent 
imputation Y* ( j ) is not drawn from the actual conditional distribution p(Y m / Y r ), but from the 
“trial density” 



p' m ) = p(r„\Y„ >n p(x:\y„, K Y r ,_, , c_, ,y„) 

1=2 



Using standard results from importance sampling, we should use weights 



.. ffig) XOM) 
p'Ko)K) pm 

pKU),r,) pa ,) 

pm p<y, 



p(r„) -xrn .C-|0)) 

po'.i.co))!.! p(n, cot) 

^ loa -.C i(4 

C (;» y p{y, r r ,.,x, (j) 



p(K,) 

pm 



Up(y„\Y„,. 



l rj - 1 



Ki 0), 




p(Y r y 



which is proportional to w(j) since p(YJ is the same for all M imputations. This implies w(j) 

(j= 1 , . . . , M) are correct weights and ( 1 .2) is an appropriate approximation. 

In sequential imputation, it is generally desirable to have the trial distribution p'(Y m / Y T ) as close 
to the true distribution p(Y m j y r ) as possible. This usually means that the complete cases should 
be processed first, and the other cases should be processed in order of increasing missingness 
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so that missing values are imputed conditioned on as many of Y t as possible. One advantage of 
sequential imputation is that this method can impute data sequentially even when the data are 
collected at different times, for example, in medical studies. 

In situations where we want to compare models, it will be important to get the likelihood of 
different models. For a particular model //the likelihood of H given incomplete data Y x is 

Ph<?,)=\p„( r r \e)K„me. 

Suppose that we have applied sequential imputation based on model H. Then for all j we have 

l = E p .[w'(j)] = E p .[w(J)/p(Y r )] , 

which implies E . [w(/)] = p(Y r ) . Therefore, 



j M 

P(K)= T7 X W C/') 

M jTJ 

is an unbiased estimate of the likelihood pfYJ for the imputation model. 

In summary, sequential imputation has three advantages over the data augmentation: (1) it does 
not require iterations; (2) it can directly estimate the model likelihood; (3) it can cheaply perform 
sensitivity analysis and influence analysis. However, it requires that p(Y\), p( Y t | Fj , .. . , Y ,_ x ) , and 
p(Q / Y) are all simple. Otherwise, it may be not feasible to implement the sequential imputation 
method. This is a very restrictive condition. 

1.6 Imputation practice across NCES surveys 

The following surveys conducted by the National Center for Education Statistics over the years 
used some method to impute for item nonresponse: 

Universe Surveys 

(1 ) Common Core of Data (CCD, conducted annually) 

(2) Private School Universe Survey (PSS, conducted biennially) 

(3) Integrated Postsecondary Education Data System (TPEDS): 

Institutional Characteristics (IPEDS-IC, conducted annually) 

Fall Enrollment (IPEDS-EF, conducted annually) 

Completions (IPEDS-C, conducted annually) 

Financial Statistics (IPEDS-F, conducted annually) 

Salaries, Tenure and Fringe Benefits of Full-Time Instructional Faculty (IPEDS-SA, 
conducted annually) 
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Fall Staff (IPEDS-S, conducted biennially) 

Academic Libraries (IPEDS-L, conducted biennially) 

Sample Surveys 

(1) Schools and Staffing Survey (SASS, conducted in 1987-88, 1990-91, 1993-94) 

(2) SASS Teacher Follow-up Survey (SASS-TFS, conducted in 1988-89, 1991-92, 1994- 
95) 

(3) National Household Education Survey (NHES, conducted in 1991, 1993, 1995, 1996) 

(4) Recent College Graduates Survey (RCG, conducted in 1976, 1978, 1981, 1985, 1987, 
1991) 

(5) National Study of Postsecondary Faculty (NSOPF, conducted in 1988 and 1993) 

(6) National Assessment of Education Progress (NAEP, conducted biennially since 1980 and 
annually from 1969 to 1980) 

(7) Third International Mathematics and Science Study (TIMSS, conducted in 1995) 

(8) National Postsecondary Student Aid Study (NPSAS, conducted at 3-year intervals since 
1986-87) 

Fast Response Surveys 

(1) Fast Response Survey System (FRSS; “College-Level Remedial Education in the Fall of 
1989,” conducted in 1990) 

(2) Postsecondary Education Quick Information System (PEQIS; “Deaf and Hard of Hearing 
Students in Postsecondary Education,” conducted in 1 993) 

Imputation methods used across these surveys are presented in table 1.6.1. 
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Table 1.6.1 — Imputation methods used across NCES surveys 



Survey 


Imputation Methods Used 


CCD 


Ratio imputation and adjustment 


PSS 


Sequential hot deck, ratio adjustment, deductive imputation 


IPEDS-IC 


Ratio imputation, mean imputation 


IPEDS-EF 


Ratio imputation, mean imputation, raking method 


IPEDS-C 


Cold deck imputation, ratio imputation, raking method, mean imputation 


IPEDS-SA 


Within- class ratio imputation, within- class mean imputation 


IPEDS-F 


Ratio adjusted cold deck imputation, sequential hot deck imputation 


IPEDS-S 


Ratio adjustment cold deck imputation, hot deck imputation 


IPEDS-L 


Logical imputation, ratio adjustment 


IPEDS-ALS 


Cold deck imputation, ratio imputation 


NSOPF 


PROC IMPUTE, sequential hot deck 


SASS 


Sequential hot deck, deductive imputation 


SASS-TFS 


Sequential hot deck, deductive imputation 


RCG 


Hot deck, within- class random imputation, deductive imputation 


NHES 


Hot deck, manual imputation 


NPSAS 


Hot deck, regression imputation, deductive 


NAEP 


Multiple imputation based on Bayesian models* 


TIMSS 


Multiple imputation based on Bayesian models* 


FRSS 


Sequential hot deck imputation, mean imputation, and median imputation 


PEQIS 


Sequential hot deck imputation, ratio adjustment 



* Multiple imputation techniques were applied to create plausible values for performance scores based on 
Item Response Theory. 
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Chapter 2 Imputation Software Products 



2.1 PROC IMPUTE (See 1 .2.6 Within-class random imputation) 

PROC IMPUTE is an advanced imputation software created by American Institutes for 
Research (AIR) under a contract with NCES. It is a stand-alone FORTRAN program and only 
works with ASQI data files. The software is in the public domain and users can obtain a copy 
through NCES. 

PROC IMPUTE is a regression-based distributional estimation procedure that is believed to be 
more general and to produce more accurate results than a standard hot deck procedure (AIR, 
1980). It considers each variable on the file in turn as a “target” variable whose missing values 
are to be filled in, and it uses information on other variables to minimize the error in imputing 
each target variable. PROC IMPUTE uses three steps that are similar to those used in hot deck 
procedure to impute each target variable: 

(1 ) It uses stepwise regression analysis to find the best combination of predictors 
for each target variable; 

(2) It creates homogeneous cells (imputation classes) of records which have close 
predicted regression values; 

(3) It imputes each missing record in a given cell with a weighted average of two 
donors which are drawn from its own cell and its adjacent cell, respectively, 
with probability proportional to the observed frequencies within the two cells. 

The weighted average value is rounded to an integer if the integer flag is set for 
the target variable. 

The software also automatically creates missing data flags for each variable with a value of “I” 
for imputed values, “R” for reported values, and “A” for skip missing values. 

Since PROC IMPUTE involves ordinary multivariate regression analysis, it only works for 
continuous and dichotomous variables. Polytomous variables need to be recoded into 
dichotomous variables before running PROC IMPUTE. 

PROC IMPUTE can incorporate about 30 variables in one imputation model. A large data set 
needs to be divided into several subsets and each subset is imputed via a separate imputation 
model. Some key variables may be included in all imputation models. Note that PROC 
IMPUTE does not need to be run multiple times to impute a large data set because of the batch 
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run feature of PROC IMPUTE: one batch run can handle all the data no matter how large the 
data set is. 

PROC IMPUTE has two other important features. First, it can create as many as nine sets of 
imputations. Although it is not “proper” according to Rubin’s multiple imputation theory (Rubin 
1987), results of our simulation study (described in chapter 5) show that, in many situations, 
PROC IMPUTE provides better multiple imputation variance estimates than some “proper” 
methods. Second, it can perform within- class imputations through a “BY” statement which is 
parallel to a SAS “BY” statement. This feature is useful for stratified data where the user may 
want to perform imputations within each stratum. It is also convenient for Monte Carlo 
simulations where multiple data sets need to be generated so that the average performance over 
replications can be assessed. Using a “BY” statement with a data set identification variable, all 
data sets can be imputed through one run of PROC IMPUTE. 

2.2 Schafer’s imputation software (See 1.5.1 Data augmentation under Imputation methods 
related to Bayesian theories) 

Dr. Joseph Schafer of Pennsylvania State University developed this public domain software. 

The original version was written using S-PLUS functions and FORTRAN subroutines and ran 
under an S-PLUS environment. The current menu-driven version for Windows was written in 
FORTAN 90. It only works with ASCII data files in which a numeric value is used to represent 
a missing value. It will not work if a is used as a missing value in the ASCII files. 

Schafer’s imputation software (Schafer 1997) applies the data augmentation method. Like the 
EM algorithm, it consists of two steps: (1) the /-step (imputation step) draws imputations for the 
missing values from the predicted distribution of the data given current parameter estimates; (2) 
the /*-step (parameter estimation step) draws parameter estimates from their posterior 
distributions given both the observed and imputed data. To start this iterative process, the EM 
algorithm or ECM algorithm (Meng and Rubin 1991) may be used to obtain initial parameter 
estimates for the first /-step. 

The software consists of three modules using different statistical models for continuous data, for 
categorical data, and for mixed continuous and categorical data. 

(1) For continuous data, the software assumes a multivariate normal distribution for the 
data, and a normal prior for the mean parameters and a normal- inverted Wishart for 
the variance- covariance parameters. Under these assumptions, the posterior 
distributions of the mean parameter and the variance- covariance parameters are 
multivariate normal and normal- inverted Wishart, respectively. Therefore, /’-steps 
draw parameter estimates from these posterior distributions and /-steps draw 




25 

34 



imputations for missing values from their predictive normal distribution with updated 
parameter estimates obtained in the P-steps. 

(2) For categorical data, the software assumes a multinomial distribution for the data and 
a Dirichlet prior distribution for the parameters. Under this saturated multinomial 
model, the posterior distribution of the parameters — the cell probabilities — is also a 
Dirichlet distribution. However, as the number of categorical variables increase, the 
number of cells formed by the variables quickly becomes enormous. In these cases, 
the software imposes loglinear constraints (Bishop, Fienberg and Holland 1975) to 
reduce the number of parameters for estimation. For these constrained loglinear 
models, a Bayesian Iterative Proportion Fitting algorithm (Gelman, Rubin, Carlin and 
Stem 1995) is used to simulate the posterior distributions for the parameters. 

(3) For mixed continuous and categorical data, the software employs a general location 
model (Olkin and Tate 1961). It assumes multinomial distribution for the categories 
defined by the categorical variables. Within each category, the continuous variables 
are assumed to have multivariate normal distribution. The prior for the parameters in 
the multinomial distribution is a Direchlet distribution and that for the parameters in the 
multivariate normal distribution is Jeffrey’s non- informative prior. In cases where the 
number of parameters becomes enormous, a loglinear constraint can be imposed on 
the multinomial parameters and a linear constraint on the mean parameters of the 
multivariate normal distribution. 

2.3 IRMA 

Imputation Run Manager (IRMA) is a public domain software developed by Synectics for 
Management Decisions, Inc., under a contract with NCES. User permission can be obtained 
through NCES. 

IRMA is designed to supply a variety of imputation techniques to the users. The current version 
of IRMA was built using Microsoft Visual Basic and includes two imputation techniques: 1) 
PROC IMPUTE and 2) Schafer’s Imputation Software. IRMA preserves all the nice features 
of PROC IMPUTE and Schafer’s Imputation Software and provides some enhanced features. 
For instance, while PROC IMPUTE and Schafer’s Imputation Software only work with ASCII 
files, IRMA works with SAS, SPSS, and ASCII data files. Another enhancement allows the 
unimputed input data file and the imputed output data file to be of different types. For example, 
the input file can be a SAS file, but the user can require IRMA to output the imputed file in 
SPSS format, or in both SPSS and SAS formats. More imputation methods will be added to a 
future version of IRMA. 
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2.4 GEIS and GES 



Generalized Edit and Imputation System (GEIS) and Generalized Estimation System (GES) 
were developed by Statistics Canada. GEIS performs data editing and imputation functions 
while GES constructs point estimates and variance estimates using a number of different 
estimation modules. The software is a S AS -based application which runs under a SAS 
environment. Data must be either in SAS format or in ASCII format with fixed field positions. A 
site license for GEIS and GES costs $20,000 (CDN), and there is a $2,000 yearly maintenance 
fee. 

The imputation methods used in this software are nearest neighbor hot deck, current ratio, 
current mean, previous value, previous mean, and auxiliary trend, which are the key methods 
used by Statistics Canada for imputation of survey missing data. All of these are single and 
deterministic imputation methods and therefore suffer the disadvantage of deflating the variance 
estimates. 

2.5 SOLAS for Missing Data Analysis 1.0 

This commercial product was developed by Statistical Solutions Limited. A single user license 
costs $995 for commercial purposes and $795 for academic purposes. 

Imputation methods used in this software include: (1) Group Mean Imputation, which replaces 
missing values with the cell means of the sample; (2) Last Value Carried Forward (Sequential 
Hot Deck), in which the last observed value is used to fill in missing values at a later point in the 
study; and (3) Nearest Neighbor Hot Deck Imputation, in which missing values are replaced 
with values taken from the closest matching respondents. Multiple imputations can also be 
created by this software. These imputation methods are not very attractive for the purpose of 
statistical inference. Any statistician with some programming skill can easily implement these 
imputation algorithms. However, SOLAS can do more than imputation. It can also perform 
many standard statistical analyses based on imputed data, including descriptive analysis, cross- 
tabulation, statistical tests (t and non- parametric), ANOVA, regression, BMDP survival 
analysis. 




36 



27 



Chapter 3 Nonresponse Bias 



Nonresponse bias is the bias of a survey estimate due to the difference between respondents 
and nonrespondents. It is one of the most important issues concerning survey data analysts. It is 
desirable to eliminate nonresponse bias through imputation and/or estimation methods. One way 
is to construct a so-called restoring estimator, defined by Rancourt, Lee, and S mdal (1994) 
as: 



Given the sample S, if the conditional expectation of the difference between an imputation 
estimator y* and the complete data estimator y s equals to 0, i.e., E{y* -y s |.S) = 0, 
where the expectation is over the response mechanism and the imputation model, then y* 
is called a restoring estimator. 

This actually is equivalent to the “first order proper” estimator defined by Rubin (1996). 

If missing values occur completely at random (MCAR) — that is, the survey has uniform 
response — , then the respondents represent the population well and survey nonresponse causes 
no bias. However, this ideal missing mechanism rarely exists in real applications. 

The most commonly assumed missing mechanism is missing at random (MAR), which may 
more appropriately be called missing conditionally at random. MAR requires that 
respondents and nonrespondents have no systematic differences given some observed auxiliary 
variables (called conditioning variables in imputation literature). One simple example of MAR 
is that respondents and nonrespondents within each imputation class formed by some predictive 
auxiliary variables both represent random samples from the subpopulation. In this case, 
estimates within each imputation class will have no nonresponse biases, and thus the combined 
overall estimates will have no nonresponse bias. Therefore, with a missing mechanism MAR, 
nonresponse bias can be corrected through imputation by conditioning on the auxiliary variables 
that are related to the missing mechanism of the target variable. In real applications, we usually 
do not know which auxiliary variables are responsible for the missing values of the target 
variable. Thus many imputation pioneers such as Rubin and Little advocate using as many 
auxiliary variables as possible to make the missing mechanism as close to MAR as possible. 

Different imputation methods use conditioning variables in different ways. Some ways are more 
effective than others depending upon the circumstances. Hot deck method uses conditioning 
variables as classification or matching variables; regression- type imputation uses conditioning 
variables as predictors through a regression model; and the data augmentation method uses the 
association between the target variable and auxiliary variables through a Bayesian model. These 
are the three most popular ways to use conditioning variables. Generally, hot deck method is the 
simplest and most intuitive way; therefore it has been used the most often in past surveys. 
However, it may be the least effective way of using auxiliary variables. Due to the efforts of 
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Rubin and many of his followers, the data augmentation method is becoming more and more 
popular. 

The most serious nonresponse bias situation is with confounded missing mechanisms; that is, the 
probability that a datum is missing depends on the target variable itself. More formally, 
confounded and unconfounded missing mechanisms may be defined as: 

Let R be the set of the respondents and S be the whole sample. A response mechanism q(- 
| 5) is said to be unconfounded if it is of the form q(R \ S ) = q(R \ Xs ); that is, it depends 
on the auxiliary variables only, and the response probabilities satisfy P(keR \ S) for all units 
ke S. If it depends on y - values as well, then it is called confounded. 

An unconfounded missing mechanism will become MAR if all auxiliary variables related to 
response probabilities axe used as conditioning variables. A confounded missing mechanism can 
never become MAR. 

With a confounded missing mechanism, it is generally impossible to completely eliminate 
nonresponse biases unless the confounded missing mechanism is known. Unfortunately, the 
missing mechanism is never known in real applications. 

Rancourt, Lee, and S mdal (1994) discussed several estimators designed to correct 
nonresponse biases for data imputed via a ratio imputation method. These estimates along with 
the ratio estimator and the observed- data-based estimator are compared via a simulation study 
in terms of bias, MSE (mean square error) and coverage rate for a variety of missing 
mechanisms. Their results are summarized as follows. 

Suppose that the data have been imputed via the ratio imputation method. The target variable is 
y and the fully observed auxiliary variable x is used to impute y. The whole sample S consists of 
n units with r respondents and m = n-r nonrespondents. The estimate of the population mean 
based on the observed values only is 



y r 




The standard ratio imputation estimate is given by 



where y' represents the imputed value for the j-th missing case, and x s is the mean of x over 
the whole sample S. 



1 



y rimp 
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Under the ideal missing mechanism MCAR, y r is unbiased and y rtmp is approximately unbiased. 
Under unconfounded missing mechanisms where missing probabilities only depend on x , y r is 
generally biased but y rimp is unbiased. If the missing mechanism is confounded, both y r and 
y rimp are generally biased. Rancourt, Lee, and S mdal suggest using 



y crimp y, r 



l + (l-^)(c|r-l) 



to correct the biases for the ratio imputation estimator when the response mechanism is 
confounded. When C=l, y crimp becomes the ratio imputation estimator y nmp . With correction 

factor c = rr~, it becomes unbiased, but it is obviously unestimable since y m is not known. 

x m / x r 

mi r 

The eight correction factors C were considered by Rancourt, Lee, and S mdal (1994): 




and 

K, = 1 - (C, 2 - l)(R 2 y -1), /=1, 2, 3, 4, 

where Wk corresponding to the rank of jcr. The k> takes into account the correlation between x 
andjy. The correction factors C\, C 3 , K u and K 2 are based on the observed data only, while the 
correction factors C 2 , C 4 , K 2 , and K 4 are based on the whole sample S. Therefore, for the 
convenience of description, y crimp with C i, C 3 , K\, or AT 3 was called the r-corrected estimate, 

while 7 . with C 2 , C 4 , K 2 , and K 4 was called the S-corrected estimate. 

In their simulation study, Rancourt, Lee, and S mdal chose 

y k =a+bx k +cx 2 k +e k , £(e k ) = 0, V(e k )= d 2 x k 

as simulation populations. Different types of populations are formed by setting the constants a, 
b, and c to different values: 

(1) RATIO: a= 0, c=0; 

(2) CONCAVE: a= 0, c<0 (c= -0.01 in the simulation); 

(3) CONVEX: a= 0, c>0 (c=0.01 in the simulation); 

(4) NONRATIO: a± 0, b> 0, c=0. 

Three correlation levels p xy = 0.7, 0.8, and 0.9 were obtained by a suitable value of d. 

Therefore, a total of 12 populations were considered: three RATIO, three CONCAVE, three 
CONVEX, and three NONRATIO with correlation levels 0.7, 0.8, and 0.9, respectively. 
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Five missing mechanisms were used in the simulation study: 

(Ml) Uniform response (MCAR); 

(M2) The nonresponse probability is a decreasing function of x k specified as 
exp(-yx k ) . This is an unconfounded mechanism. 

(M3) The nonresponse probability is an increasing function of specified as 
1 - exp(-yx k ) • This is also an unconfounded mechanism. 

(M4) The nonresponse probability is a decreasing function ofy* specified as 
exp {-yy k ) ■ This is a confounded mechanism. 

(M5) The nonresponse probability is an increasing function of specified as 
1 - exp {-yy k ) • This is also a confounded mechanism. 

The smaller units will be underrepresented in the response set R for (M2) and (M4), while the 
larger units will be underrepresented in the response set R for (M3) and (M5). The constant y is 
determined such that the average nonresponse rate is equal to one of the values 10 percent, 20 
percent, 30 percent, and 40 percent. 

The ten estimates were compared in terms of bias, mean square error, and coverage rate of the 
95 percent confidence intervals. The primary findings are: 

(1) The r- corrected estimators (using C\, C 3 , K\, Ki) performed very poorly since the 

correction only used the observed data for*; 

(2) For uniform response mechanism (Ml), both uncorrected estimators y r and y nmp have 

better performance than the corrected estimators. But the loss is not very severe by 
mistakenly using the correction when it is not necessary for uniform nonresponse; 

(3) For unconfounded missing mechanisms (M2) and (M3), the ratio imputation estimator 
y nmp has the best performances for RATIO, CONCAVE and NONRATIO 

populations, while the S- corrected estimators have the best performances for the 
CONVEX population; 

(4) For confounded mechanisms (M4) and (M5), y nmp is better than the S- corrected 

estimators for CONCAVE and NONRATIO populations, but the S-corrected 
estimators are better than y nmp for RATIO and CONVEX populations; 

(5) The observed- data based estimator J7. performs poorly for all nonuniform response 
mechanisms. All estimators perform poorly for CONVEX populations with the (M5) 
response mechanism. 
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All in all, the correction to the ratio imputation estimator is not a great success in this study. 
Correction with observed data of jc (r- corrected estimators) should never be recommended. 
We will generally benefit from the S'- corrected estimators with CONVEX populations. 




41 



32 



Chapter 4 Variance Estimation and Multiple Imputation 



One of the most common criticisms on the use of imputation for missing data is that it leads to 
underestimated variances. Generally, deterministic single imputation more seriously 
underestimates variances than random single imputation does. Rubin (1987) sees it as a 
disadvantage of single imputation that “. . . the one imputed value cannot in itself represent 
uncertainty about which value to impute: If one value were really adequate, then that value was 
never missing. Hence, analyses that treat imputed values just like observed values generally 
systematically underestimate uncertainty, even assuming the precise reasons for nonresponse are 
known.” In Rubin’s opinion, multiple imputation is needed to obtain “proper” variance 
estimates. 

However, Rao (1996) cites some disadvantages of multiple imputations: 

• significantly higher costs of storage and processing of multiple data sets; 

• general ABB methods for generating proper imputations that accommodate issues of 
clustering, stratification, and weighting to compensate for unequal probabilities of selection 
are not currently available; 

• a small number of imputations, m , may result in a low level of precision for the multiple 
imputation variance estimator since the between imputation variance based on m- 1 degrees 
of freedom may be poorly estimated. 

This chapter summarizes and discusses three types of variance estimation methods for imputed 
survey data. Section 4.1 discusses the method proposed by S mdal (1992) which attempts to 
add imputation variances to the overall variance estimates without performing multiple 
imputation. Section 4.2 describes the application of jackknife variance estimation methods for 
imputed data (Raol996; Fay 1996). Inference based on multiply imputed data is discussed in 
section 4.3. 

4.1 Add imputation variance without multiple imputation 

S mdal (1992) tries to correct underestimated variances by adding the component of 
imputation variance to the sample variance for data imputed via a single imputation procedure. 

Suppose U is the population (N units), S is the sample (n units), and R is the respondents (r 
units). Denote the true value of the total by t, the estimate based on the complete data by t , and 
the estimate based the imputed data by t. (obtained via the same formula as t ). Our interest is 
the variance of t, since t. is the actual estimate used in the inference. 

The total error of t. can be decomposed as 

t.-t = (i.-t) + (i - r) imputation error + sampling error. 
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We define the imputation residual as e k = y k - y\ , which can not be observed for a unit 
k gS- R ■ Then the imputation error becomes t,-t = - ^ w k e k • 

ksS-R 

The model- assisted approach considers three different distributions, one is “with respect to the 
imputation model” (indicated by t,), the second one is “with respect to the sampling design” 
(indicated by S ), the third one is “with respect to the response mechanism, given S” (indicated 
by R ). The estimator t, is overall unbiased in the sense that E$E s E R (t. -t) = 0 if two 

conditions hold: 

(a) order of the expectations can be changed: E^E S E R (-) = 

(b) imputation residuals have zero model expectation: E^(e k ) = 0 . 

Condition (a) is satisfied if the response mechanism is one that may depend on S and on 
auxiliary data, but not on the y- values. 

The overall variance of an unbiased estimator t, is 

V, ot =E i E s E R Kt-t) + (t.-t)] 2 = E i V p + E s E R Vf~V sam + V imp , 

where V p = E s (t - 1 ) 2 is the design-based variance of t , and = E$ [(?. - ?) 2 |S,i?] is the 

conditional model-based imputation variance. In the above equation, we ignore the cross- 
product term. The argument for obtaining the sample variance V sam and the imputation variance 
V imp is as follows: 

(i) : Let V p be the standard estimator of the design variance for a complete 

data set, and V. p is the quantity obtained via the same formula for V p using the 
imputed data. Evaluate the conditional expectation (V p - = V dif , and find a 

model unbiased estimator V dlf for Vd^ which will usually require the estimation of 
certain parameters of the model 

(ii) V imp : Find a model unbiased estimator for V, , which may again require the 
estimation of unknown parameters of the model E, . Then v» is overall unbiased for the 
imputation variance Vj mp . 

Note that the role of V dif is to correct for the fact that the data after imputation may display 

“less than natural” variation. This often happens when the imputed values equal the predicted 
value from a fitted regression, that is, “the value on the line”. The variation around the line is not 
reflected in the predicted value. As shown for the ratio imputation method, if residuals and 
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predicted values are used as imputed values, V dif is no longer needed to be added to the 
sample variance estimator. 

Here is a simple example. Suppose the sample S is drawn with SRSWOR and the response 
mean y R is imputed for all missing values. The corresponding imputation model £, states that 
y k = p + e k , where the e k are uncorrelated error terms with £ { (e*) = 0 , V^e k ) = a 2 . Then 

i.=Ny, 

V p = N 2 (\/n-\/ N)^ s (y k -y s ) 2 /(n-l) = N 2 (l / n - 1 / N)S 2 yS 
K P = ^(--^7 ~T«) 2 /(«-l) = iV 2 (— — I7> Z - 1 T ,sr >« 

y n N R n A' «-l 

Since S 2 yS = E$S 2 R , (V p - V. p |5,R) = N 2 (1 / n - 1 / N)(n -r) / (« - • 

Therefore, V dtf = N 2 (1 / « - 1 / tV)(« -m)l{n- 1)5^ is a model unbiased estimator for V m 
which gives 



Km =V. p +V di f = N 2 {\ln-\l N)S 2 yR , 



Since 



V imp = E^t.-t ) 2 =(- 7 ) (n~r) 2 E^y s _ R -y R ) 2 

= [■ 7 ] ( n - r ) 2 { E $fs-R +E 5yl- E 5lys-RyR]]=[^ ] 



= N 2 (\/r-\/n)o 2 



we have V imp = N 2 {\!r-\ In) S 2 yR . Therefore, V lol = V sam +V imp = N 2 (M r -M N)S 2 yR . 



The following table shows the contribution of each variance component to the total variance for 
SRSWOR using the mean imputation method for three different missingness rates. Note when 
the missing rate is 30 percent, the variance based on the imputed value only accounts for 49 
percent of the total variance, while the variance due to imputation accounts for another 30 
percent. Thus 21 percent of the total variance needs to be added to the sampling variance. 
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Table 4.1.1 — Contribution of each variance component to the total variance for the 
SRSWOR sampling with the mean imputation method 



Missing rate in percentage 


Contribution (in percentage) to V tot 


100(l-r/«) 


v. P v dif 


v 

r imp 


10 


81 9 


10 


20 


64 16 


20 


30 


49 21 


30 



The analytical formulas for SRSWOR sampling with the ratio imputation method has also been 
derived in S mdal (1992). 

As a comment on this approach, it is very convenient that imputation variance can be estimated 
without performing multiple imputation and , therefore, there is no need for a great deal of 
storage space and processing time which multiple imputation requires. The variance estimates 
obtained through this method may be more accurate than those obtained through a small number 
of multiple imputations since a small number of multiple imputations may lead to poor between- 
imputation variance estimation. However, S mdal (1992) only derived analytical formulas for 
two simple cases: SRSWOR sampling with the mean and ratio imputation. For a more complex 
survey design and/or more complicated imputation algorithms, the derivation is not trivial and 
may be impossible. It will be even more difficult to apply the method to nonlinear statistics such 
as median, quartile, ratio, etc. Furthermore, this method only takes care of variance estimates. It 
seems arduous to adjust for covariance via this method. 

To make this method more attractive, random imputation methods should be used instead of 
deterministic imputation methods, because deterministic imputation methods not only distort the 
distribution of data, but also require extra effort to estimate Vdif 

4.2 Jackknife variance estimation with imputed data 

Rao (1996) and Fay (1996) extended the jackknife variance estimation method to imputed 
survey data. Rao (1996) discussed the jackknife method for imputed survey data for two 
situations: (1) stratified random sampling with ratio imputation and regression imputation; (2) 
stratified multistage sampling with cell mean imputation and weighted hot deck imputation. Fay 
(1 996) applied the jackknife method to imputed data via fractionally weighted imputation. 

4.2.1 Jackknife variance estimation with imputed data for stratified random sampling 

Rao (1996) expanded the jackknife variance estimation method to imputed survey data 
collected with a stratified random sampling design. Let «h be the sample size and M, be the 
population size for the h-th stratum (h= 1 , 2, . . .L). In case of complete data, a design-unbiased 
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L 



(p- unbiased) estimator of population mean is given by y = ^JV k y k , where W k = N h / N h is 



the weight for stratum h and y k is the h-th stratum sample mean. The jackknife variance 
estimator is given by 



where y h{ J) is the jackknife sample mean obtained by deleting the j-th observation from the h- 
th stratum. 

In presence of nonresponses, let /l* and/t m h be the sample of respondents and nonrespondents 
in that stratum. The jackknife sample mean y H ~ j)a can be adjusted in the following way: (1) 
under deterministic imputation, if a respondent is left out, all the imputed values should be 
adjusted by the amount y‘ h ]~ J) - y hj , where y' h \~ }) is the value that one would impute for the i-th 
nonrespondent if the j-th respondent is deleted in the h-th stratum; (2) under stochastic 
imputation, if a respondent is excluded, each of the imputed values in stratum h should be 
adjusted by an average amount E)j J) y' u - E,y' kt , where E* denotes expectation with respect to 
the imputation procedure given the donor set and E*~ J) is the expectation with respect to the 
imputation procedure when the donor set is modified by excluding unit j. Then the jackknife 
variance estimator with imputed data is given by 



The following two examples apply this technique to ratio imputation and regression imputation. 
Example 1 (ratio imputation). Suppose that an auxiliary variable x closely related to an item y is 



nonrespondent in the h-th stratum. Under this deterministic imputation procedure, if j-th 
respondent is excluded in the jackknife variance estimation, the imputed value will be 



A stochastic counterpart of ratio imputation adds the donors’ residuals to the above ratio 
imputed values. Under this imputation approach, E, y hi = {y rh /x rh )x hi and 

Ei~ j) y‘ M = (ylj J) /x^ J) )x hl . Thus the adjusted imputed values are given by 





where y, -XL*. feu y hi + ^ y' u )/ n h is the overall sample mean with imputed data. 



observed on all sample units. Ratio imputation uses y ‘. = ^r~x hi as imputed values for the i-th 




yh + {ylh J) / x ( rh J) ) x hi -{y r h / x rh )x M . 



ERIC 
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Example 2 (regression imputation). Again assume that x is observed on all sample units. Linear 
regression imputation uses y' hi = y rh + f} rh (x u -x rH ), where fi rh is the ordinary least square 

regression coefficient based on the respondents in stratum h. Under this deterministic imputation 
procedure, when the j-th respondent is deleted in the jackknife variance estimation, the imputed 
values will be y?~ J) = yf h j) + p£ J) [x ki - x£ J} ), where pf J) is the least squares regression 
coefficient when the j-th respondent is deleted. 

A stochastic counterpart of regression imputation adds a donor’s residual to the above 
imputations, where the donor is selected through a simple random sampling. Under this 
approach, we have E { ~ J) y' hi = y u and Ef J) y* u = y ( f J) = yf J) +fc J> (x u - x^ J) ) . Thus the 
adjusted imputed values are given by y' M + y ( r ~ j) - y rh if the j-th respondent is deleted and 
remain unchanged if the j-th non- respondent is deleted. 

In these two examples, the imputed estimators of mean are approximately design-unbiased 
under uniform response within each stratum, as well as design model unbiased under their 
super-population models (defined in sections 1.3.1 and 1.3.2). The jackknife variance 
estimators are /^-consistent, as well as approximately design model unbiased under their super- 
population models. 

Rao (1996) also discussed jackknife variance estimation for stratified multistage sampling design 
with missing data imputed by the class mean imputation method and the weighted within- class 
hot deck method. We omit them here because they are parallel to the two examples given 
above. Linearized versions of the jackknife variance estimators, which are useful with computer 
programs that use the linearization method of variance estimation (e.g., SUDAAN), are also 
provided in that paper. 

However, as Judkins (1996) pointed out, this jackknife method is essentially a univariate tool 
with well behaved extensions only for variables that are either never missing or are missing or 
present in whole blocks. It has only been applied to simple statistics such as total, mean or 
functions of total or mean under marginal imputation. For more complex statistics, such as 
regression and correlation coefficients, marginal imputation often attenuates the association 
between variables. Joint imputation from the same donor, called common donor hot deck, may 
be used sometimes to alleviate this problem with marginal imputation when a record has several 
missing related values. This method preserves bivariate relationships only when both variables 
are missing; that is, when there are no partial nonrespondents with respect to the two variables. 

4.2.2 Jackknife variance estimation with fractionally weighted imputation 

Fay (1996) discussed the application of the jackknife variance estimation method to survey data 
imputed through the fractionally weighted imputation (FWI) method. FWI creates one set of 
imputations by fractionally weighting m sets of imputations. In general, FWI assigns a weight 
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1/m to each of the m imputations. If the original analysis is weighted, then the m imputed values 
each receive 1/m times the original weight. 



LetA T and A „ be the sample of respondents and nonrespondents, respectively, n be the total 
sample size, and r be the number of respondents. For any data imputed via a single imputation 
method, the mean may be estimated by 
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where y r and y' nr are the mean of the reported values of the respondents and the mean of 
imputed values for the nonrespondents respectively. The standard jackknife variance estimator 
is 



where 
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This naive jackknife variance estimate treats the imputed values as hue observed values. Rao 
and Shao (1992) modified this jackknife mean by 
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where yf J) = ( ry r -y,)/(r- 1) is the mean of the (r- 1) respondents without jth observation. 
This formula reflects that, when a respondent is deleted, each imputed value y* need to be 
adjusted by the amount of (y ( f i) - y r ) since we only have r - 1 respondents for imputation when 
jth respondent is left out. For example, for the mean imputation method, the originally imputed 
values^* = y r for all is A nr , and then the adjusted imputed value is y^ j) when jth respondent is 

left out. 



For fractionally weighted imputations, the mean may be estimated by 
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where y", is Ith imputation for jth missing value. The Rao- Shao type jackknife variance 
estimate may be constructed by replacing y ( ~ J)a with 
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Fay (1996) claimed that “unlike MI (multiple imputation), the RS (Rao-Shao type) variance 
estimator does not use variation among the m different imputed sets.. . .Because the effect of 
missing data is incorporated in the variance calculation as a whole, instead of isolated. . .for MI, 
it is generally unnecessary to reference a t distribution to obtain adequate approximation for 
construction of confidence intervals” (p. 492). 



In some situations, Rubin’s multiple imputation (non-proper MI) inference may have inconsistent 
variance estimates. A modified version of Rao-Shao type jackknife variance estimate may be 
used: 
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In this jackknife variance estimate, the first sum of squares are usual jackknife terms, and the 
second sum of squares are designed to capture the variations usually added by the proper 
multiple imputations. 

Fay (1996) points out, “FWI resembles MI but may be distinguished by (a) the manner in which 
the imputations are made, (b) the procedures to obtain the estimates from the data set, and (c) 
the variance estimation and analysis of the resulting data set” (p. 492). 

Some anomalies given by Fay demonstrate that MI does not address effectively for some 
relatively simple situations. This is not surprising because, as Judkins (1996) pointed out that “ 
Fay’s fractionally weighted imputation (FWI) can be expected to yield true variance no larger 
than multiple imputation with the same number of replicates” (p. 508). Based on his finding, Fay 
suggests that researchers implement Monte Carlo studies to examine the performance 
characteristics of MI to develop a body of systematic evidence before applying it to specific 
problems. 
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However, Fay’s FWI method is subject to the same limitation as the Rao’s jackknife described 
in the preceding section; that is, it is basically a univariate tool and hard to extend to the 
multivariate case. Rubin (1996) liirther criticizes the limitation of Fay’s FWI method: ‘Tay’s 
approach is essentially constrained to the special situation where (a) there is the simplest pattern 
of nonresponse (i.e., there are respondents with no missing data and nonrespondents with all 
outcome variables missing), (b) hot- deck draws (possibly weighted) are made from each 
adjustment cell to impute donor values to nonrespondents, (c) there are effectively an unlimited 
number of respondent donors in each adjustment cell, and (d) the adjustment cell classification 
and design weights are assumed to control adequately for nonresponse biases for all estimands 
of interest. Since hot-deck classification is based on observed variables, Fay’s approach 
implicitly assumes an ignorable nonresponse mechanism, because otherwise (d) is violated” (p. 
515). 

4.3 Multiple imputation inference 

The discussion in this section is based on Rubin (1996). 

4.3.1 Objectives of imputations 

The basic objective of imputation is to allow ultimate data users to apply their existing analysis 
tools to any dataset with missing values using the same command structure and output standards 
as if there were no missing data. Certain ad hoc methods of handling missing data, such as 
“complete- case analysis,” “available -case analysis,” and “fill-in with means” satisfy this basic 
objective and so have a certain appeal. 

The ideal supplemental objective of imputation is that each complete- data statistical tool can be 
applied to each incomplete dataset to obtain the same inference as if the dataset had no missing 
values. This objective is obviously unachievable no matter what imputation method is used. It is 
analogous to saying that the objective of a survey is to obtain the same answer as a complete 
census. 

A less-ideal achievable supplemental objective could be as follows. Assuming that the ultimate 
user’s complete-data analysis is statistically valid for a scientific estimand, the answer that results 
from applying the same analysis method to an incomplete- data remains statistically valid for the 
same scientific estimand assuming the truth of the database constructor’s posited model for 
missing data. This supplemental goal can be achieved through some imputation methods, but can 
not be achieved through others. 

Before we discuss multiple imputation inference, let’s first clarify the meanings of scientific 
estimands and statistical validity. 

Scientific Estimands : Quantities of scientific interest that can be calculated in the population 
and do not change its value depending on the data collection design used to measure them (i.e., 
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they does not vary with sample size and survey design, or the number of nonrespondents or 
follow-up efforts). For example, scientific estimands include population means, variances, 
correlations, factor loadings, regression coefficients, but exclude the sampling variance of a 
sample mean under a particular sampling plan and the expectation of the complete- data sample 
mean when missing values are filled in with zero or the observed sample means. 

Statistically Validity. This must be a frequency concept, averaging over randomization 
distributions generated by known sampling mechanisms and posited distribution for the response 
mechanisms. Bayesian validity is also important, but is far more difficult to achieve in this context 
because it requires far more compatibility between the database constructor and the analyst. 

First and foremost, to achieve statistical validity for scientific estimands, point estimation must be 
approximately unbiased for the scientific estimands, averaging over the sampling and the posited 
nonresponse mechanisms. Second, interval estimation and hypothesis testing must be valid in the 
sense that nominal levels describe operating characteristics over sampling and posited response 
mechanisms. There are two versions of frequentist validity for nominal levels: randomization 
validity and confidence validity. Randomization validity means that, for interval estimates, the 
actual interval coverage equals the nominal interval coverage, and for tests of hypotheses, the 
actual rejection rate equals the nominal rejection rate. Confidence validity means that, for 
interval estimates, the actual coverage rate is greater than or equal to the nominal coverage rate, 
and for tests of hypotheses, the actual rejection rate is less than or equal to the nominal rejection 
rate. Confidence validity is a more generally achievable objective. 



To express the concepts in mathematical equations, let A' be the array of all background 
information fully observed in a population and Tbe the array of outcome information in the 
population that is to be sampled in the survey. Q = Q(X, Y) is a scientific estimand. Suppose Q 
is a complete- data estimate of Q with sampling variance consistently estimated by the statistic 
U. Then randomization validity with complete-data is equivalent to 



and 



E{Q\x, Y)b Q (unbiasedness of point estimate) 

E(U\x,Y) b Var(Q \x, Y) (unbiasedness of variance estimate). 



For confidence validity with complete data, the second condition is replaced by 

E{U\x,Y)>Var{Q\x,Y). 



4.3.2 Multiple imputation inference 

The goal of multiple imputation (sometimes also called repeated imputation ) is to provide 
statistically valid inference in the difficult real- world situation where (1) ultimate users and 
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database constructors are distinct entities with different analyses, models, and capabilities, and 
(2) there typically is no one accepted reason for the missing data. 

Multiple imputation was designed to satisfy both the achievable basic objective and the 
achievable supplemental objective stated in preceding sub- section by using Bayesian and 
frequentist paradigms in complementary ways: the Bayesian model- based approach to create 
procedures, and the frequentist (randomization-based approach to evaluate procedures. 

Multiple imputation is based on the following Bayesian results: 

P(d[Y abs ) = J P($Y ohs ,Y mls )P{Y mi \Y nhs )dY mis , 

or in words 

(Actual posterior distribution of Q) = A VE (complete- data posterior distribution of Q), 

where A VE (complete- data posterior distribution of Q) refers to the average over the repeated 
imputations, which are draws from P(Y mis | 7 0 bs), which is the posterior predictive distribution of 
missing data given the observed data. About the first two moments, we have: 

E(Q \r obs ) = E[E($Y obs ,Y mls )\Y obs ] 

or in words 

(Posterior mean of Q) = A VE (repeated complete- data posterior means of Q ) 

V(^Y obs ) = E[V(Q\Y ota , Y^ )|f 0 J + V[E^Y 0bs ,Y mb )\Y 0b3 -\ . 

Suppose that we have m sets of repeated imputations, and the Ith (/= 1,2, m) point estimate 

and its corresponding variance- covariance estimate based on the Ith set of imputed data using 
standard formulas are (Q.,,U.,) ■ Then the repeated- imputation estimate of Q is: 

Qm =Xl Q'i I m ■ 

The associated variance- covariance of Q is: 

z—Tn 



r.-XT'V»+— 

1 m 



where U m = Y / m is the within- imputation variability, and 



m - 1 ;=i 



is the between- imputation variability. We expect: 
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(Q-QJ~N{0,TJ, 



where = Bm Q m and T„ = lim T m . 

m— 

A “propel’ multiple imputation procedure treats (X, Y) and the intended sample (as indicated by 
1 ) as fixed, and deals with the fixed but unknown values of the complete- data statistics (Q,U) in 
the sample as if they were estimands. That is, the randomization distribution critically involved in 
the definition of proper multiple imputation is generated by the response mechanism, in which X, 
Y, and / are fixed , and the response indicator R is the random variable. That means a proper 
imputation must satisfy the followings: 

E(QjX,Y,I) = Q (4.1) 

E(UjX,Y,I) = U (4.2) 

E(B_\X, Y,J) = Var(Q„\X,Y,I) (4.3) 

The definition of proper concerns the situation where “population” equals complete-data 
sample, “estimands” equals complete- data statistics (Q,U) , and “survey design” equals the 
posited response mechanism. The criterion is valid frequency inference, and the method for 
creating inferences is Bayesian predictive inference using simulated values. 

It follows from (4.1)-(4.3) that, if the complete-data inference is randomization- valid and the 
multiple imputation procedure is proper, the infinite- m repeated imputation inference is 
randomization- valid under the posited response mechanism. 

Rubin (1987, chapter 4) presented analytic results, simulation evaluations, and many examples 
of proper and improper multiple imputation methods, where the evaluations were all from the 
random- response randomization- based frequentist perspective. The trick in many of the 
examples of proper imputation was to get the variance condition (4.3) correct, and it was 
shown that when drawing imputations to approximate repetitions from a sensible Bayesian 
model, conditions (4.1)-(4.3) typically followed automatically. The more straightforward 
conditions, (4.1) and (4.2), typically were simple properties of any intelligent imputation scheme 
that tried to track the data. An example of a method that does not track the data is “fill in the 
mean,” which, although it may satisfy (4.1) for Q = y, fails to do so for Q = s 2 or for the 25 th 
percentile, or to satisfy (4.2) for U = s 2 /n, etc. Hot deck (bootstrap) and random-draw 
regression methods tend to satisfy (4.1) and (4.2) but fail to satisfy (4.3) until a Bayesian, 
systematic between- imputation component of variability is added (e.g., via the Bayesian 
Bootstrap), to reflect uncertainty in the estimation of population parameters. 

A multiple imputation procedure is strongly superefficient for the complete-data statistic Q if, 
first, and Q estimate the same estimand, that is, the procedure is “first-moment proper” for 
Q : E(Q„\x,Y) =E(q\x,Y) » and second has no larger variance than the complete-data 
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estimate itself: Var(Q„\x,Y) < Var(Q\x,Y) ■ If the second condition is replaced by 
Cov(Q „ , X, Y) < Var( gjx, Y ) , then it is called superefficient imputation. Strongly 
superefficient imputation implies superefficient imputation. 

A multiple imputation procedure is confidence-proper for the complete-data statistics (Q,U) if 
the imputations are “first- moment proper” for (Q,U) and 

E(U m \X,Y)=E{U\X,Y) 

and if B„ conservatively estimates the “excess variance” of over Q : 



E{B„\X,Y) > Var(Q„\x,Y) -Var(Q\x,Y) 

If a multiple imputation procedure is proper for (Q, U) it is confidence proper for (Q,U ) . If the 
complete- data inference based on (Q,U) is confidence valid and the multiple imputation 
procedure is confidence proper for ( Q,U ) , then the repeated- imputation inference is confidence 
valid no matter how complex the survey design. 

According to Rubin (1996), any imputation method that satisfies the validity objective in 
generality must not only reflect the underlying response mechanism but must also be a random 
draw method. Nonrandom draw methods can be applied in special cases but require special 
analysis techniques. Of course, the development of user-friendly appropriate software for 
creating multiple imputations and analyzing multiply- imputed data is still badly needed. 

Rubin (1996) also advises including all variables in a multiple imputation model to make it 
proper in general. If X is correlated with 7 but not used to multiply- impute Y, then the multiply- 
imputed dataset will yield estimates of the (X, Y) correlation biased towards zero. Thus, the 
danger with an imputer’s model is generally in leaving out predictors rather than including too 
many, and the advice has always been to include as many variables as possible when doing 
multiple imputation. Nevertheless, because problems can occur when the imputer’s model 
leaves out important predictor variables, the database constructor must include a description of 
the imputation model with the multiply- imputed database, so that ultimate users know which 
relationships among variables have been implicitly set to zero. This is obviously good advice in 
principle, but it may be difficult to do in practice. 

4.3.3 Current issues concerning multiple imputation 

Rubin (1996) also discussed current issues concerning multiple imputation. The first issue 
focuses on its implementation: operational difficulties for the database constructor and the 
ultimate user, as well as the acceptability of answers obtained partially through the use of 
simulation. The second issue concerns the frequentist validity of repeated-imputation inferences 
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when the multiple imputations are not proper, but appear “reasonable” in some sense. 
Specifically, Rubin raised four questions and tried to answer them: 

(1 ) Is multiple imputation unprincipled or unacceptable because it uses simulation ? 

It is critical to remember that multiple imputation does not pretend to create information through 
simulated values but simply to represent the observed information this way to make it amenable 
to valid analysis using complete- data tools. The extra noise created when using a finite number 
of imputations is the price to be paid for this luxury. 

With multiple imputation, the simulation is only being used to handle the missing information, with 
reliance for handling the rest of the information left to the complete- data method, be it analytic 
or simulation- based. Jackknife and Bootstrap use many more simulations. More explicitly, 
hundreds or thousands of simulations will be needed for bootstrap or jackknife methods, 
whereas as few as five multiple imputations (or even three in some cases) are adequate under 
each model for nonresponse. The asymptotic efficiency of the repeated- imputation finite- wi 
estimate relative to the infinite m estimate is [1 + (y / w)]“ I/2 in units of standard deviations, 
which is close to one with realistic fractions of missing information y and modest m. 

(2) Is multiple imputation too much work for the user? 

(3) Does it take too much work to create proper or approximately proper multiple 
imputations? 

(4) Can repeated imputations under an appropriate Bayesian model lead to invalid 
inferences? 

His arguments to these three questions are not very convincing and therefore are not repeated 
here. There are no “right” answers to questions (2) and (3). Different people may have different 
opinions. Regarding question (4), Fay (1996) seems to give a “yes” answer; that is, it is possible 
that multiple imputation under a Bayesian model may lead to invalid inferences. 
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Chapter 5 Simulation Study 



5.1 Simulation design 

The simulation design factors are described as follows. 

5.1.1 Distribution 

Four sets of variables were generated for the simulation study. The distribution type and name 
of each of the variables generated are described below. 

(1) Five variables from N(jj., 1) denoted as Norml, Norm2, Norm3, Norm4, Norm5 
with ji= 1, . . ., 5, respectively; 

(2) Five variables from a double exponential distribution denoted as Dexpl, Dexp2, 
Dexp3, Dexp4, and Dexp5 with means of 1, 2, 3, 4, and 5, respectively, and variances 
equal to 2; 

(3) Five variables from mixed normal distributions (i.e., 95 percent N(jj., 1) and 5 percent 
N(jj., 3 2 )) denoted as MixNorml, MixNorm2, MixNorm3, MixNorm4, and 
MixNorm5 with fi= 1, . . ., 5, respectively. 

(4) Five variables from mixed normal distributions (i.e., 95 percent N(p., 1) and 5 percent 
£ 2 (4) _ 4 + n ) denoted as MixNChil, MixNChi2, MixNChi3, MixNChi4, and 
MixNChi5 with fJ= 1, . . ., 5, respectively. 

The first three sets of variables were symmetric about their means, while the fourth set of 
variables was right skewed. The five variables in each set had means of 1, 2, 3, 4, and 5 
respectively. Each set of five variables were correlated with the following correlation matrix: 
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The correlation coefficients between different sets of variables were small. 
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5. 1 .2 Missing mechanism 

(1) MCAR: Missing values in variables Norml , Dexpl , MixNorml , and MixNChil were 
missing completely at random (MCAR); 

(2) Tail values more likely missing (unconfounded): Missing values in Norm2 

were created with probability of exp(-X /Norml-1/), where X was determined so that 
on average 10 percent, 20 percent, 30 percent, and 40 percent missing values were 
generated for the four missing rate categories under study. This was an unconfounded 
missing mechanism. Since Norml and Norm2 were positively correlated with 
correlation coefficient 0.9, tail values were missing with higher probabilities. Missing 
values in Dexp2, MixNorm2, and MixNChi2 were similarly created using Dexpl, 
MixNorml, and MixNChil; 

(3) Large values more likely missing (unconfounded): Missing values in Norm3 
were created with probability of exp[-X (Norm2-2)J , where X was determined so 
that on average 5 percent, 10 percent, 15 percent, and 20 percent missing values 
were generated for the four missing rate categories under study. This was an 
unconfounded missing mechanism. Since Norm2 and Norm3 were positively 
correlated with correlation coefficient 0.8, large values of Norm3 were missing with 
higher probabilities. Missing values in Dexp3, MixNorm3, and MixNChi3 were 
similarly created using Dexp2, MixNorm2, and MixNChi2; 

(4) Center values more likely missing (unconfounded): Missing values in Norm4 
were created with probability of l-exp[-X /Norm3-3/J, where X was determined so 
that on average 10 percent, 20 percent, 30 percent, and 40 percent missing values 
were generated for the four missing rate categories under study. This was an 
unconfounded missing mechanism. Since Norm3 and Norm4 were positively 
correlated with correlation coefficient 0.7, center values of Norm4 were missing with 
higher probabilities. Missing values in Dexp4, MixNorm4, and MixNChi4 were 
similarly created using Dexp3, MixNorm3, and MixNChi3; 

(5) Tail values more likely missing (confounded): Missing values in Norm5 

were created with probability of 1-expf-X /Norm5-5/J, where X was determined so 
that on average 10 percent, 20 percent, 30 percent, and 40 percent missing values 
were generated for the four missing rate categories under study. This was a 
confounded missing mechanism since the probabilities of missing Norm5 depended on 
itself. Missing values in Dexp5, MixNorm5, and MixNChi5 were similarly created. 

We use the term “one-side missing mechanism” for mechanism (3) and the term “two-side 
missing mechanism” for the other four mechanisms for the convenience of description. 
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5.1.3 Missing rates 



For missing mechanisms (1), (2), (4), and (5), the four types of missing rates were 10 
percent, 20 percent, 30 percent, and 40 percent, while for missing mechanisms (3), the 
four types of missing rates were 5 percent, 10 percent, 15 percent, and 20 percent. 

5.1.4 Imputation methods 

(1) Mean Imputation (deterministic): Missing values were replaced with the sample 
mean. 



(2) Ratio Imputation (deterministic): Missing values in y were replaced by 



y, = 



y 1 

— X i 

X 

X obs 



+ 1 5 



where y gbs and * were the means of the observed values for the target variable and 
auxiliary variables respectively. Norml, Norm2, Norm3, andNorm4 served as 
auxiliary variables for Norm2, Norm3, Norm4, and Norm5, respectively. 

Since the means of the target variables were one more than the means of the auxiliary 
variables, we subtracted 1 from the numerators of the ratios and added 1 back to the 
final imputed values. This means that we used ratio imputation model E(y- 1) = fix 
instead of E(y) = fk because the later model led to very bad results. 



We did not use ratio imputation for Norml since we needed to create a complete 
auxiliary variable to start the ratio imputation process. Because missing values in 
Norml were missing completely at random, we started with this variable and imputed 
its missing values using the mean with disturbance method described in (5) below. 

The other three sets of five variables were imputed in the same way as the normal 
variables. 



(3) Sequential nearest neighbor hot deck method (deterministic): This is also called the 
traditional hot deck method. To impute any one of the five variables in each set, the 
data were first sorted by the other four variables of that set. The observed mean 
served as the starting stored value. Then the sequential imputation process started to 
check each record in the sorted data file. If a record had a response for the target 
variable, the stored value was updated by this new response value; if a record missed 
the target variable, the currently stored value would serve as the imputation value. 
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(4) Random imputation method (random): Randomly drew imputations from the 
observed values (with replacement). 

(5) Mean imputation with disturbance (random): Random disturbances drawn from 
N(0, s 2 ) were added to the mean imputation (1), where s is the sample variance. 

(6) Ratio imputation with disturbance (random): Random disturbances were drawn 
from N(0, s 2 ) were added to the ratio imputation (2), where s 2 is the sample variance. 

(7) Approximate Bayesian Bootstrap (ABB) method (random): First drew r values 
randomly with replacement from the observed values Jj , . . . , Y r to create Y ^ bs , and then 
drew m values randomly with replacement from Y^ bs for imputation, where r and m 
were the number of observed values and that of missing values. 

(8) Bayesian Bootstrap (BB) method (random): First, drew r-1 uniform random 
numbers between 0 and 1, and let their ordered values be a a r _, ; also let ao=0 
and a T = 1, where r was the number of respondents. Then, drew each of the m missing 
values by drawing from Y t ,...,Y r with probabilities (a,-a 0 ), (a 2 -a,),..., (1 — a r _, ) ; 
that is, independently m times, drew a uniform random number u, and imputed Y\ if 
a,_, <u<a t (/=1, 2, ..., r). 

(9) PROC IMPUTE (random): First, used a stepwise regression approach to find 
the best regression equations and then used the predicted regression values to form 
the “optimal” imputation classes. Then, for each missing record, two observed values 
were drawn and weighted to form the imputation value. One of the two observed 
values were drawn according to the estimated distribution of the observed values from 
its own imputation class and the other from the nearest imputation class. 

(1 0) Data Augmentation (random): This Bayesian iterative method assumed two 
distributions: the distribution of the data and the prior distribution of the parameters. 
The imputation process consisted of two steps: (i) /-step: with current parameter 
estimates, drew imputations for the missing values from the predicted distribution of 
the data; (ii) P-step: with both the observed data and the imputed values of the missing 
data, drew parameter estimates from their posterior distribution. To start this iterative 
process, we may use the EM algorithm to obtain initial parameter estimates for the first 
/-step. Schafer’s software was used to implement this method in our simulation. This 
software assumes multivariate normal distribution for the data, and normal prior for the 
parameters of means and normah inverted Wishart for the variance-covariance 
parameters. 
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(1 1) Adjusted data augmentation method (random): If the normality assumption for 
the continuous data in Schafer’s software is in question, it is desirable to let the 
observed data 7 0 bs influence the shape of the distribution of values imputed for 7 m i S . 
We can accomplish this as follows. First /i' and o ’ 1 were drawn in the same way 
from their posterior distributions as in Schafer’s software. Then the components of m- 
dimensional vector X - (X i ,...,X m ) were drawn with replacement from 7 0 b S . Under 
repeated draws from 7 0 bs, the standardized variable 

Z, = (Xi -5vW( , '- 1 ) s r ~r 

had expected value 0 and variance 1 . Finally, the m components of 7 m i s were set 
equal to /i' -kt’Z,, i=\, 2, ...., m. 

For each combination formed by the above simulation factors, 200 replicate runs were 
performed. We assessed the imputation methods based on their average performance over the 
200 replications. The sample size for each replicate data set was 100. 

5.2 Simulation results 

We compared the imputation methods in terms of bias of parameter estimates (mean, median, 
first and third quartiles), bias of variance estimates (single and multiple imputations), coverage 
probability, confidence interval width, and average imputation error. Analyses and conclusions 
according to each criterion based on the simulation results follow. The detailed simulation results 
are presented in tables 5.2.1. 1-5.2. 7. 5. 

5.2.1 Bias of population mean estimates 

Tables 5.2.1 .1-5.2. 1.5 present the biases of population mean estimates for the 1 1 imputation 
methods under study. Table 5.2.1 .1 combines the four missing rate categories with overall 
missing rates of around 25 percent for missing mechanisms (1), (2), (4), and (5), and about 10 
percent for missing mechanism (3). The remaining four tables describe the biases for missing 
rate categories 10 percent, 20 percent, 30 percent, and 40 percent. The numbers of missing 
values for one-side missing mechanism (3) are about half of those for the other four two-side 
missing mechanisms. 

For symmetric distributions (normal, double exponential, and mixed normal) and two -side 
missing mechanisms, the population mean estimates based on the incomplete data are 
theoretically unbiased. Therefore, the values in the first three rows in each block except block 3 
of tables 5.2.1. 1-5.2. 1 .5 are all pretty close to zero. For these cases, it does not make much 
sense to compare the imputation methods in terms of improvement of biases. 
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When large values are more likely to be missing, block 3 of table 5.2. 1 . 1 shows that the 
negative biases caused by missing values, which are the same as those for the mean imputation 
method, are considerable for all four types of distributions although there are only about 10 
percent missing values. As the distributions depart further from normal, the biases become more 
and more serious. The ratio imputation method, ratio imputation with disturbance method, and 
Schafer’s software perfectly corrected the biases. PROC IMPUTE and the sequential nearest 
neighbor hot deck method improved the biases substantially, but PROC IMPUTE has a 
significant advantage over the hot deck method. Since the adjusted data augmentation method 
introduces more impact of the observed data and the observed data are biased for missing 
mechanism (3), this method results in only slight (negligible) improvement of the biases. All other 
imputation methods are helpless with the nonresponse biases because these methods do not use 
any auxiliary information from other variables. 

We believe that one reason why the ratio imputation method performs so well is because we 
used the same variables to create and to impute the missing values for each target variable. The 
second reason is the high correlation coefficients (at least 0.6) between the target variables and 
the auxiliary variables used by the ratio imputation method. The ratio imputation method is more 
sensitive to the model specification because it directly uses the predicted values from the 
equations as imputation values. Actually, when we used ratio imputation model E(y) = fa 
instead of E(y - 1) = fa in our first attempt, the results were worse than any other method. 

Later we subtracted 1 from y so that the means of y- 1 and x were equal. But this is not a 
requirement of the ratio imputation method. It is more natural for many analysts to consider the 
model E{y) = fa to impute y with auxiliary variable x rather than E(y- 1) = fa . Therefore, we 
should be very cautious in the selection of ratio imputation models in real applications where the 
underlying missing mechanisms and the data distributions are generally unknown. 

The fourth row of each block in tables 5.2.1. 1-5.2. 1 .5 present the biases for the right skewed 
distribution, the mixer of 95 percent Normal and 5 percent Chi-square. These biases are not 
severe when the missing rates are low. As the missing rates increase, the biases become 
considerable. For the MCAR missing mechanism, all imputation methods are supposed to 
provide unbiased mean estimates. For missing mechanisms (2) and (3), since tail values are 
more likely missing and the right side has more tail values with the right skewed distributions, the 
mean estimates based on the incomplete data will underestimate the population mean. It is 
evident that the biases with the confounded mechanism (5) are much more serious than with the 
unconfounded mechanism (2). On the other hand, for missing mechanism (4), when center 
values are more likely missing, the estimates based on the incomplete data tend to overestimate 
the population mean. But the right skewness will not have as much effect with this missing 
mechanism as with missing mechanisms (2) and (5) since center values have much less effect on 
the mean estimates than tail values. That is why row 4 of block 4 in tables 5.2.1 .2-5.2. 1 .4 does 
not show positive biases. However, the positive biases are substantial in row 4 of block 4 in 
table 5. 2. 1.5 when the missing rate increases to 40 percent. 
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We found earlier that ratio imputation with or without disturbance, Schafer’s software, PROC 
IMPUTE, and hot deck are all very effective in improving the biases caused by missing 
mechanism (3). However, the improvement is much less impressive for the biases caused by the 
right skewness of the distributions, although these methods can still provide improvement in 
most cases when considerable biases exist with the incomplete data. Overall, they are still a little 
better than the other methods. 
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Table 5.2.1. 1 — Bias of population mean estimates (overall *) 
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Table 5.2.1.2 — Bias of population mean estimates with about 10% missing values 

Missing Mean Ratio Hot Proc 

Mechanism Distribution Imp. Imp. Deck Random Mean +e Ratio + e ABB BB Impute Schafer Adj DA 

l.MCAR Normal -0.019 -0.023 -0.021 -0.021 -0.024 -0.014 -0.019 -0.021 -0.020 
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Table 5.2. 1.3 — Bias of population mean estimates with about 20% missing values 

Missing Mean Ratio Hot Proc 

Mechanism Distribution Imp. Imp. Deck Random Mean +e Ratio +e ABB BB Impute Schafer Adj DA 

l.MCAR Normal -0.003 0.003 0.001 -0.005 0.002 -0.010 -0.007 -0.011 0.000 
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Table 5.2.1. 4 — Bias of population mean estimates with about 30% missing values * 

Missing Mean Ratio Hot Proc 

Mechanism Distribution Imp. Imp. Deck Random Mean+e Ratio +e ABB BB Impute Schafer Adj DA 

l.MCAR Normal 0.017 0.044 0.013 0.019 0.010 0.005 0.023 0.014 0.014 
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Table 5.2.1.5 — Bias of population mean estimates with about 40% missing values * 

Missing Mean Ratio Hot Proc 

Mechanism Distribution Imp. Imp. Deck Random Mean +e Ratio +e ABB BB Impute Schafer Adj DA 

l.MCAR Normal -0.018 0.023 -0.021 -0.027 -0.002 -0.015 -0.010 -0.006 -0.009 
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5.2.2 Bias of variance estimates with single imputation 



Tables 5.2.2. 1-5.2.2.5 report the relative biases of variance estimates based on the incomplete 
data and the data imputed by the 1 1 methods. The relative biases are defined as: 



Re lative 



Bias = 



(Estimated Var) - (True Var) 
True Var 



(5.1) 



In this formula, we are discussing the variance among the data Var (y t ) , not the variance of the 
mean estimates Var(y) , although the relative biases of the two variance estimates are equal for 
all the imputation methods. We will use the statement “the variance is 20 percent overestimated” 
if the relative bias is 0.20, and say “the variance is 20 percent underestimated” if the relative bias 
is -0.20. 



For the MCAR missing mechanism, the variance estimates based on the incomplete data are 
supposed to be unbiased, which was confirmed by the simulation. It is to be expected that the 
mean imputation method seriously underestimates the variances since the data were centralized 
by using the mean as the imputed values for all missing cases. One way to correct this 
underestimation is to multiply the variance estimates by the factor («- l)/(r-l), where n is the 
sample size and r is the number of observed values. The other way is to add random variation 
to the mean as imputation values as done by the mean with disturbance imputation method. 
Actually, the variance estimates based on the incomplete data and those based on the mean with 
disturbance imputation method are always approximately equal across all missing mechanisms 
and all distributions. 

For MCAR, all other methods seem fine except the sequential hot deck method which provides 
a few very large variance estimates for the mixed distribution of 95 percent normal and 5 
percent Chi-square. For example, the sequential hot deck overestimated the variance by 70 
percent and 24 percent respectively when there are 40 percent and 30 percent missing values. 
This is probably because some extremely large values were imputed too many times by the hot 
deck sequential imputation scheme. Therefore, the sequential hot deck imputation method is 
dangerous even for MCAR missing mechanism if extreme values or outliers exist in the 
observed data. For other distributions, the hot deck method works well. 

For unconfounded missing mechanism (2) where tail values are more likely missing, the 
incomplete data shrink to the center and, therefore, the variance estimates based on the 
incomplete data are too small. This underestimation is much less serious than for the confounded 
missing mechanism (5) where tail values are also more likely missing but the missing probabilities 
depend on the target variable itself. For mechanism (2), Schafer’s software performs better than 
the ratio imputation, which is better than PROC IMPUTE, which is better than the hot deck 
method. However, all four methods dramatically improved the negative biases of the variance 
estimates. The ratio imputation with disturbance method tends to overestimate the variances. 
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Slight improvement has been found with the adjusted data augmentation method. It is evident 
and expectable that the BB, ABB, random, and the mean with disturbance imputation methods 
all have almost the same variance estimates as the incomplete data, while the mean imputation 
method worsens the variance estimates. 

For unconfounded missing mechanism (3) where large values are more likely missing, the 
incomplete data have shorter range than the complete data; therefore, the incomplete data will 
underestimate the true variance. Since the missing rates are always less than 20 percent, the 
underestimation of the variances is not severe. Except for one case, all negative biases are 
smaller than 1 1 percent of the true variances. In this cases all imputation methods except the 
mean imputation provide fine variance estimates. However, Schafer’s software, ratio imputation, 
PROC IMPUTE, and the hot deck method still shows some advantage over the other methods. 

For unconfounded missing mechanism (4) where center values are more likely missing, the 
incomplete data overestimate the variances and so do the random, mean imputation with 
disturbance, ratio imputation with disturbance, ABB, and BB methods, while the mean 
imputation still underestimates the variances. These methods cannot improve the positive biases 
at all. Overall, Schafer’s software has the best performance, followed by the hot deck method, 
which is followed by PROC IMPUTE, which is followed by the ratio imputation. All four 
methods substantially improved the positive biases of variance estimates. The hot deck method 
has one bad case in which it overestimates the variance by 23 percent for the mixer of normal 
and Chi-square when the missing rate is 40 percent, but it is still a significant improvement over 
the incomplete data which overestimate the variance by 37 percent. Again, the adjusted data 
augmentation method can improve the biases slightly. 

For confounded missing mechanism (5) where tail values are more likely missing and the missing 
probabilities depend on the target variable itself, the incomplete data underestimate the 
variances much more seriously than for unconfounded missing mechanism (2). Again, the 
random, mean imputation with disturbance, ratio imputation with disturbance, ABB, and BB 
methods do not help at aft with the biases. Schafer’s software, adjusted data augmentation and 
the hot deck method only slightly improve them. PROC IMPUTE only have improvement with 
the mixed distribution of normal and Chi-square which has much more serious underestimated 
variances than the other distributions. For this distribution, PROC IMPUTE is better than 
Schafer’s software, adjusted data augmentation, and the hot deck method. For this confounded 
missing mechanism, the only methods which can substantially improve the biases in variance 
estimates are ratio imputation with or without disturbance. These two methods are the only ones 
in this study that directly use auxiliary variables to predict missing values. This probably implies 
that we may have to use some directly predictive approach such as regression imputation or 
ratio imputation to impute missing values if the missing mechanism is confounded; that is, if the 
missing probabilities depend on the target variable itself. 

In summary, for the MCAR missing mechanism, all imputation methods can provide acceptable 
variance estimates except the mean imputation method, which needs to be adjusted with a 
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factor of (w-l)/(r-l). For unconfounded missing mechanisms, Schafer’s software performs best, 
and ratio imputation, PROC IMPUTE, and the hot deck method can all improve the biases of 
variance estimates dramatically, but the ratio imputation with disturbance method tends to 
overestimate the variance. For the confounded missing mechanism, only the ratio imputation 
method with or without disturbance substantially improves the biases. The random, ABB, BB, 
and mean imputation with disturbance methods are almost equivalent to the incomplete data for 
all missing mechanisms, while the adjusted data augmentation method always helps a little, but 
never much. 
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Table 5.2.2.1 — Relative bias of variance estimates with single imputation (overall *) 
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Table 5. 2. 2. 2 — Relative bias of variance estimates with single imputation with about 10% missing values * 
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Table 5.2.2.3 — Relative bias of variance estimates with single imputation with about 20% missing values * 
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Table 5. 2. 2.4 — Relative bias of variance estimates with single imputation with about 30% missing values * 
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Table 5.2.2.5 — Relative bias of variance estimates with single imputation with about 40% missing values * 
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5.2.3 Bias of variance estimates of population mean with five sets of imputations 



Five sets of imputations were created for the eight random imputation methods under study. 
Variance estimates based on the five sets of multiple imputations are obtained through Rubin’s 
multiple imputation theory: 



V = 



-It 

m , = , 



+ 




m 

/=! 



(5.2) 



where Q t and are the parameter estimate and variance estimate, respectively, based on i-th 
(/= 1 , . . . , m) set of imputations. The first term in (5.2) is called the within- imputation variability, 
and the second term is referred as the between- imputation variability. 

Tables 5.2.3. 1-5.2.3.5 present the relative biases of variance estimates of population mean 
estimates. The relative biases are defined as in (5.1). Multiple imputation variance estimates are 
generally larger than single imputation variance estimates since multiple imputation adds the 
between- imputation variation. 

If the data are missing completely at random, all methods except PROC IMPUTE and 
Schafer’s software substantially overestimate the variances. For the combined data with about 
25 percent missing values, the random, mean with disturbance, ratio with disturbance, and 
adjusted data augmentation methods all overestimate the variance by 25 percent to 35 percent, 
while ABB and BB methods overestimate the variances by 35 percent to 55 percent. Even with 
a 10 percent missing rate, these methods overestimate the variances by more than 10 percent in 
most cases. It seems that the second term in (2.2) is too much to add to the variance estimates. 
The ABB and BB methods, which introduce more variation than the random method and are 
considered “proper” by Rubin (1987), seem to overestimate the variances most seriously. 
PROC IMPUTE provides the best variance estimates with this ideal missing mechanism 
although it is not “proper” according to Rubin’s definition. Its multiple imputation variance 
estimates can be considered unbiased. Schafer’s software is the second best and it slightly 
overestimates the variances. 



For unconfounded missing mechanisms (2) and (3) where the incomplete data underestimate the 
variances, the multiple imputation variance estimates corrected more negative biases than the 
single imputation variance estimates, as expected. PROC IMPUTE and Schafer’s software 
again have the best overall performance. All other methods except the ratio with disturbance 
method produce fine variance estimates. The ratio with disturbance method significantly 
overestimate the variances even for these two missing mechanisms when the incomplete data are 
more concentrated around the center than the population distribution. 

For the unconfounded missing mechanism (4) when center values are more likely missing and 
the incomplete data are more diversified than the population distribution, the relative 
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performances across the different imputation methods are similar to those for the ideal missing 
mechanism (1). PROC IMPUTE works best and provides approximately unbiased variance 
estimates, Schafer’s software is the second best and slightly overestimates the variances. Other 
methods all overestimate the variances; the ABB and BB methods are the worst in terms of bias 
of variance estimates. 

For confounded missing mechanism (5) when the incomplete data seriously underestimate the 
variance, the extra variation introduced by multiple imputation helps reduce the negative biases 
of single imputation variance estimates for all methods except the ratio with disturbance 
imputation method. The ratio with disturbance imputation method again overestimate the 
variances. Except for the mixed distribution of normal and Chi-square, PROC IMPUTE has the 
largest negative biases and the ABB and BB methods have the smallest biases, while all the 
other methods are close to the ABB and BB methods. For the mixed right- skewed distribution 
of normal and Chi-square, PROC IMPUTE has the smallest negative biases; however, all 
methods except the ratio with disturbance method still substantially underestimate the variances. 

In summary, the ratio with disturbance imputation method always overestimates the variances 
for all types of missing mechanisms when between- imputation variation is introduced via multiple 
imputations. For this method, the idea of multiple imputation is obviously inappropriate. PROC 
IMPUTE seems to have the least between- imputation variation and it provides approximately 
unbiased variance estimates for the MCAR and all unconfounded missing mechanisms. The 
ABB and BB methods introduce the most between- imputation variation and most seriously 
overestimate the variances for the MCAR and missing mechanism (4) when the incomplete data 
are more diversified than the true distribution. For these two types of missing mechanisms, 
multiple imputation variance estimates of all methods except PROC IMPUTE tend to 
overestimate the true variances. For the other missing mechanisms when the incomplete data are 
less diversified than the true distribution, introducing between- imputation variation can help 
reduce the negative biases of variance estimates except for the ratio with disturbance method. 
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Table 5.2.3.3 — Relative bias of variance estimates with five sets of imputations with about 20% missing values 

Missing Proc 

Mechanism Distribution Random Mean +e Ratio +e ABB BB Impute Schafer Adj DA 

l.MCAR Normal 0.218 0.266 0.340 0.278 0.033 0.056 0.234 

Dexp 0.316 0.285 0.317 0.265 0.016 0.093 0.320 
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5.2.4 Coverage rates 



The coverage rate is defined as the ratio of the number of simulation replications in which the 
confidence interval estimates cover the true value to the total number of simulation replications. 
Tables 5.2.4. 1-5.2.4.5 report the coverage rates of the 95 percent confidence interval 
estimates covering the true means for the combined missing category and separate missing 
categories, respectively. 

Schafer’s software obviously has the best coverage rates. It has almost perfect rates across the 
five missing mechanisms for all missing rate categories. The adjusted data augmentation method 
also has almost perfect coverage rates for all missing rate categories and all missing mechanisms 
except mechanism (3). This method has fairly low coverage rates for this missing mechanism 
when missing rates are higher than 20 percent. The reason is that this method substantially 
underestimated the true mean for this missing mechanism. It seems that imputation methods 
based on Bayesian theory give better coverage rates under similar conditions, which concurs 
with Rubin’s point of view. 

Ratio and ratio with disturbance imputation methods have great coverage rates for missing 
mechanisms (2), (3), and (5) when tail values or large values are missing at higher probabilities. 
Although the two methods are not as good for missing mechanism (4) when the incomplete data 
are more diversified than the true distribution, they are still acceptable when missing rates are 
lower than 30 percent. With 40 percent missing values, the coverage rates of the two ratio 
imputation methods are moderately low (from 78 percent for mixed distribution of normal and 
Chi-square and 90 percent for the normal distribution). This is because the two methods 
significantly overestimate the mean for this missing mechanism, as shown in our bias analyses. 

PROC IMPUTE has very good coverage rates except for missing mechanism (5). Some rates 
are low for mechanism (5) when missing rates are higher than 25 percent. The sequential hot 
deck method is significantly worse than PROC IMPUTE in terms of coverage rates, but it is 
better than the other methods which do not use any auxiliary information, especially for missing 
mechanism (3). Not much difference has been found among the mean imputation, random 
imputation, mean with disturbance imputation, ABB, and BB methods. The coverage rates of 
these methods are too low, especially for missing mechanisms (3) and (5), when missing rates 
are higher than 20 percent. 
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Table 5.2.4. 1 — Coverage rates with single imputation (overall *) 
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Table 5.2.4.2 — Coverage rates with single imputation with about 10% missing values * 
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Table 5.2.4.3 — Coverage rates with single imputation with about 20% missing values * 
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Table S.2.4.4 — Coverage rates with single imputation with about 30% missing values * 
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Table 5.2.4.S — Coverage rates with single imputation with about 40% missing values * 
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5.2.5 Confidence interval width 



A 95 percent confidence interval width was obtained via the distribution of the 200 mean 
estimates based on the 200 simulation replications. The lower confidence limit was equal to the 
average of the fifth and sixth smallest estimates, and the upper confidence limit was equal to the 
average of the fifth and sixth largest estimates. Shorter confidence interval alone does not 
necessarily imply a better method. A method which provides shorter confidence intervals with 
higher coverage rates is generally preferred because the method is more likely to provide more 
concentrated point estimates around the true values. 

Table 5.2.5. 1 presents the confidence interval widths for the estimates based on the complete 
data and the data imputed by the 1 1 imputation methods. For missing mechanisms (2), (3), and 
(5), tail values or large values are more likely missing and the incomplete data are less diversified 
than the true distribution, and so are the imputed data. Therefore, the estimates based on the 
imputed data tend to have less variation than the complete data, and consequently the 
confidence intervals tend to be too short. This tendency can especially be seen in missing 
mechanism (5). The readers may need to compare the methods in terms of confidence interval 
widths along with the biases of variance estimates discussed in section 5.2.2 and coverage rates 
described in section 5.2.4. 

On the other hand, for missing mechanism (4), the incomplete data are more diversified than the 
complete data, and therefore the estimates based on the imputed data tend to have more 
variation. Consequently, the confidence intervals based on the imputed data tend to be too 
wide. 

Overall, Schafer’s software and the adjusted data augmentation method have the shortest 
confidence intervals across the five missing mechanism. We also found in the preceding section 
that these two methods also gave the best coverage rates except for missing mechanism (3) with 
the adjusted data augmentation method. Therefore, the two methods are least likely to provide 
bad estimates. The other methods seem not to have substantial advantage over each other in 
terms of confidence interval width. 
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Table 5.2.5.1 — Confidence interval width with single imputation (overall *) 
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5.2.6 Bias of quartile estimates 



We obtained estimates of median and the first and third quartiles for all imputed data to 
investigate how imputation affects the data distribution. Tables 5.2.6. 1-5 .2.6.3 give the biases 
of the first quartile, the third quartile, and the median estimates, respectively, for the combined 
missing rate categories. 

The mean imputation method is obviously the worst in terms of quartile estimates across all five 
missing mechanisms. The data are centralized so that the first quartiles are substantially 
overestimated, while the third quartiles are substantially underestimated. The median estimates 
are pretty much similar to those of the incomplete data. The only exceptions are the first quartile 
estimates for missing mechanism (3) in which the positive biases are very small. This is because 
both missing values created via missing mechanism (3) and the means imputed for the missing 
values are larger than the first quartiles so that the first quartile estimates based on the imputed 
data are very close to those based on the complete data. We will not include this method for 
discussion in this section. 

For the MCAR missing mechanism, all methods except the mean and the mean with disturbance 
imputation methods give fine estimates for all the quartiles. The mean with disturbance 
imputation method gives fine estimates for the normal and the contaminated normal distributions, 
but it has significantly larger negative biases of the first quartile estimates and significantly larger 
positive biases of the third quartile estimates for the double exponential distribution and the 
mixed distribution of normal and Chi-square. This implies that the disturbance drawn from 
N(0 ,s 2 0bs ) diversified the true data, where S^ bs is calculated from the observed data from the 
double exponential distribution or the mixed distribution of normal and Chi-square. 

For unconfounded missing mechanism (2), since the incomplete data are less diversified than the 
true distributions, the first quartiles are overestimated while the third quartiles are 
underestimated. Five methods — Schafer’s software, PROC IMPUTE, hot deck, ratio and ratio 
with disturbance imputation — all substantially reduce the biases of the first and third quartile 
estimates compared to the incomplete data. The adjusted data augmentation method has slight 
improvement for the third quartile estimates, but no improvement for the biases of the first 
quartile estimates. The random, mean with disturbance, ABB, and BB imputation methods do 
not improve the first and second quartile estimates compared to the incomplete data. For this 
missing mechanism, all methods provide fine median estimates because values are missing 
symmetrically at both tails. 

For unconfounded missing mechanism (3), since the incomplete data are less diversified than the 
true distributions, the first quartiles are overestimated while the third quartiles are underestimated 
by the incomplete data. Similar results to those for mechanism (2) have been found for the first 
and third quartile estimates. The biases of these quartile estimates based on the data imputed by 
Schafer’s software, ratio imputation, ratio with disturbance imputation, PROC IMPUTE, and 
hot deck are at least twice smaller than those based on the incomplete data. Among these five 
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methods, hot deck is obviously worse than Schafer’s software, PROC IMPUTE, and the ratio 
imputation method. All other methods except the mean imputation method have some 
improvement over the incomplete data but it is not substantial. For this missing mechanism, the 
medians are underestimated by the incomplete data. Schafer’s software and PROC IMPUTE 
reduce the negative biases by 4 to 50 times, while hot deck, ratio imputation, ratio with 
disturbance imputation reduce the negative biases by 2 to 10 times. All other methods reduce 
the biases of the incomplete data median estimates slightly. 

For unconfounded missing mechanism (4), since the incomplete data are more diversified than 
the true distribution, the first quartiles are underestimated while the third quartiles are 
overestimated by the incomplete data. The hot deck method has the best overall performance in 
terms of biases of quartile estimates, followed by PROC IMPUTE and Schafer’s software. 
Among these three methods, Schafer’s software is best for normal distribution, but much worse 
than hot deck and PROC IMPUTE for the mixed distribution of normal and Chi-square. The 
other methods do not improve the biases over the incomplete data. Although the ratio 
imputation method shrinks the diversified incomplete data, the imputed data are shrunk too 
much so that they have less variation than the true distribution. The magnitudes of the biases of 
the first quartile estimates are larger than those of the incomplete data, but it is the other way 
around for the third quartile estimates. On the other hand, the random imputation, ABB, BB, 
and adjusted data augmentation methods have slightly better first quartile estimates but slightly 
worse third quartile estimates in terms of bias. All methods except ratio imputation and ratio 
with disturbance imputation provide as good median estimates as the incomplete data. Ratio 
imputation and ratio with disturbance imputation worsen the median estimates compared to the 
incomplete data. 

For confounded missing mechanism (5), since the incomplete data are less diversified than the 
hue distributions, the first quartiles are overestimated while the third quartiles are underestimated 
by the incomplete data. The ratio with disturbance imputation method obviously has the best 
performance and reduces the biases of the incomplete quartile estimates by two to six times. 
Ratio imputation and Schafer’s software also improve the quartile estimates over the incomplete 
data. The other methods slightly worsen the first quartile estimates while slightly improving the 
third quartile estimates. All methods give fine median estimates with this missing mechanism. 
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Table 5.2.6.1 — Biases of the first quartile estimates (overall *) 
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Table S.2.6.2 — Biases of the third quartile estimates (overall *) 
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Table 5.2.6.3 — Biases of median estimates (overall ) 
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5.2.7 Average imputation error 



Average imputation error is defined as 



where m is the number of missing values, yi is the true value which is intentionally set to missing, 
and y‘ is the imputed value for the i-th missing case. That an imputation method has smaller 
average imputation errors only implies that the method provides imputations on average closer 
to the real values. This does not necessarily means that it gives more accurate estimates for all 
types of statistics, although this is true in many situations. 

Tables 5.2.7. 1-5.2.7.5 present average imputation errors for the combined missing rate 
categories and each separate missing rate category, respectively. The figures in the tables have 
been standardized by dividing the true standard deviation from the original imputation errors. 

Across all missing mechanisms, the random imputation, mean with disturbance imputation, ABB, 
BB, and adjusted data augmentation methods all have the similar imputation errors that are 
significantly larger than the imputation errors for the other methods for almost all distributions, all 
missing rates, and all missing categories. 

The ratio imputation method always has the smallest or close to smallest average imputation 
errors. Schafer’s software and PROC IMPUTE are competitive candidates. These three 
methods have substantially smaller average imputation errors than the others. The hot deck, 
ratio with disturbance imputation, and mean imputation methods sit in the middle in terms of 
average imputation error. They are significantly worse than the three best methods, but they are 
better than the worst five methods. Mean imputation has very small imputation errors for missing 
mechanism (4) because center values are more likely missing with this missing mechanism and 
the mean imputation method imputes the mean values for them. 

It is also noticed that most methods give fairly consistent average imputation errors, while 
PROC IMPUTE and hot deck have much larger average imputation errors for the mixed 
distribution of normal and Chi-square than they do with the other three distributions for all 
missing mechanisms except mechanism (4). This probably indicates that these two methods are 
not very good at recovering tail or large missing values. 

The relative performance of the imputation methods in terms of average imputation error is very 
consistent across the missing rate categories. 




96 



87 



Table 5.2.7.1 — Average imputation error (overall *) 
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Table 5.2.7.2 — Average imputation error with about 10% missing values 

Missing Mean Ratio Hot Proc 

Mechanism Distribution Imp. Imp. Deck Random Mean +e Ratio +e ABB BB Impute Schafer Adj DA 

l.MCAR Normal 1.061 0.822 1.450 1.375 1.440 1.452 0.641 0.632 1.458 
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Table S.2.7.3 — Average imputation error with about 20% missing values 

Missing Mean Ratio Hot Proc 

Mechanism Distribution Imp. Imp. Deck Random Mean -t-e Ratio +e ABB BB Impute Schafer Adj DA 

l.MCAR Normal 0.983 0.855 1.430 1.389 1.377 1.416 0.609 0.583 1.434 
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Table S.2.7.4 — Average imputation error with about 30% missing values 

Missing Mean Ratio Hot Proc 

Mechanism Distribution Imp. Imp. Deck Random Mean +e Ratio +e ABB BB Impute Schafer Adj DA 

l.MCAR Normal 1.014 0.991 1.381 1.413 1.431 1.433 0.669 0.608 1.386 
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Table 5.2.7.5 — Average imputation error with about 40% missing values * 

Missing Mean Ratio Hot Proc 

Mechanism Distribution Imp. Imp. Deck Random Mean +e Ratio +e ABB BB Impute Schafer Adj DA 

l.MCAR Normal 1.005 1.043 1.390 1.417 1.394 1.435 0.709 0.614 1.393 
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The Schools and Staffing Survey (SASS) for 1998-99: Design Recommendations to 
Inform Broad Education Policy 

Should SASS Measure Instructional Processes and Teacher Effectiveness? 

Making Data Relevant for Policy Discussions: Redesigning the School Administrator 
Questionnaire for the 1998-99 SASS 
1998-99 Schools and Staffing Survey: Issues Related to Survey Depth 
Towards an Organizational Database on America’s Schools: A Proposal for the Future of 
SASS, with comments on School Reform, Governance, and Finance 
Predictors of Retention, Transfer, and Attrition of Special and General Education 
Teachers: Data from the 1989 Teacher Followup Survey 
Nested Structures: District-Level Data in the Schools and Staffing Survey 
Linking Student Data to SASS: Why, When, How 
National Assessments of Teacher Quality 

Measures of Inservice Professional Development: Suggested Items for the 1998-1999 
Schools and Staffing Survey 

Student Learning, Teaching Quality, and Professional Development: Theoretical 

Linkages, Current Measurement, and Recommendations for Future Data Collection 
Selected Papers on Education Surveys: Papers Presented at the 1996 Meeting of the 
American Statistical Association 



NCES contact 
Stephen Broughman 
Steven Kaufman 
Dan Kasprzyk 

Stephen Broughman 



Steven Kaufman 



Dan Kasprzyk 

Dan Kasprzyk 
Dan Kasprzyk 
Dan Kasprzyk 

Dan Kasprzyk 

Dan Kasprzyk 

Dan Kasprzyk 

Dan Kasprzyk 
Dan Kasprzyk 
Dan Kasprzyk 
Dan Kasprzyk 

Sharon Bobbitt & 
John Ralph 
Samuel Peng 
Samuel Peng 

Sharon Bobbitt 

Steven Kaufman 
Dan Kasprzyk 

Dan Kasprzyk 

Dan Kasprzyk 

Dan Kasprzyk 
Dan Kasprzyk 

Dan Kasprzyk 
Dan Kasprzyk 

Dan Kasprzyk 
Dan Kasprzyk 

Dan Kasprzyk 

Dan Kasprzyk 
Dan Kasprzyk 
Dan Kasprzyk 
Dan Kasprzyk 

Mary Rollefson 

Dan Kasprzyk 



No. 

97-07 

97-09 

97-10 

97-11 

97-12 

97-14 

97-18 

97-22 

97-23 

97-41 

97-42 

97- 44 

98- 01 
98-02 
98-04 
98-05 

98-08 

98-12 

98-13 

98-14 

98-15 

98-16 

1999-02 

1999-04 

1999-07 

1999-08 

1999-10 

1999-12 

1999-13 

1999-14 

1999- 17 

2000- 04 

2000-10 

2000-13 

2000-18 



Title 

The Determinants of Per-Pupil Expenditures in Private Elementary and Secondary 
Schools: An Exploratory Analysis 
Status of Data on Crime and Violence in Schools: Final Report 

Report of Cognitive Research on the Public and Private School Teacher Questionnaires 
for the Schools and Staffing Survey 1993-94 School Year 
International Comparisons of Inservice Professional Development 
Measuring School Reform: Recommendations for Future SASS Data Collection 
Optimal Choice of Periodicities for the Schools and Staffing Survey: Modeling and 
Analysis 

Improving the Mail Return Rates of SASS Surveys: A Review of the Literature 
Collection of Private School Finance Data: Development of a Questionnaire 
Further Cognitive Research on the Schools and Staffing Survey (SASS) Teacher Listing 
Form 

Selected Papers on the Schools and Staffing Survey: Papers Presented at the 1 997 Meeting 
of the American Statistical Association 

Improving the Measurement of Staffing Resources at the School Level: The Development 
of Recommendations for NCES for the Schools and Staffing Survey (SASS) 
Development of a SASS 1 993-94 School-Level Student Achievement Subfile: Using 
State Assessments and State NAEP, Feasibility Study 
Collection of Public School Expenditure Data: Development of a Questionnaire 
Response Variance in the 1993-94 Schools and Staffing Survey: A Reinterview Report 
Geographic Variations in Public Schools’ Costs 

SASS Documentation: 1 993-94 SASS Student Sampling Problems; Solutions for 

Determining the Numerators for the SASS Private School (3B) Second-Stage Factors 
The Redesign of the Schools and Staffing Survey for 1999-2000: A Position Paper 
A Bootstrap Variance Estimator for Systematic PPS Sampling 
Response Variance in the 1994-95 Teacher Follow-up Survey 
Variance Estimation of Imputed Survey Data 

Development of a Prototype System for Accessing Linked NCES Data 
A Feasibility Study of Longitudinal Design for Schools and Staffing Survey 
Tracking Secondary Use of the Schools and Staffing Survey Data: Preliminary Results 
Measuring Teacher Qualifications 

Collection of Resource and Expenditure Data on the Schools and Staffing Survey 
Measuring Classroom Instructional Processes: Using Survey and Case Study Fieldtest 
Results to Improve Item Construction 
What Users Say About Schools and Staffing Survey Publications 
1993-94 Schools and Staffing Survey: Data File User’s Manual, Volume III: Public-Use 
Codebook 

1993- 94 Schools and Staffing Survey: Data File User’s Manual, Volume IV: Bureau of 
Indian Affairs (BIA) Restricted-Use Codebook 

1994- 95 Teacher Followup Survey: Data File User’s Manual, Restricted-Use Codebook 
Secondary Use of the Schools and Staffing Survey Data 

Selected Papers on Education Surveys: Papers Presented at the 1998 and 1999 ASA and 
1 999 AAPOR Meetings 

A Research Agenda for the 1 999-2000 Schools and Staffing Survey 
Non-professional Staff in the Schools and Staffing Survey (SASS) and Common Core of 
Data (CCD) 

Feasibility Report: School-Level Finance Pretest, Public School District Questionnaire 



NCES contact 
Stephen Broughman 

Lee Hoffman 
Dan Kasprzyk 

Dan Kasprzyk 
Mary Rollefson 
Steven Kaufman 

Steven Kaufman 
Stephen Broughman 
Dan Kasprzyk 

Steve Kaufman 

Mary Rollefson 

Michael Ross 

Stephen Broughman 
Steven Kaufman 
William J. Fowler, Jr. 
Steven Kaufman 

Dan Kasprzyk 
Steven Kaufman 
Steven Kaufman 
Steven Kaufman 
Steven Kaufman 
Stephen Broughman 
Dan Kasprzyk 
Dan Kasprzyk 
Stephen Broughman 
Dan Kasprzyk 

Dan Kasprzyk 
Kerry Gruber 

Kerry Gruber 

Kerry Gruber 
Susan Wiley 
Dan Kasprzyk 

Dan Kasprzyk 
Kerry Gruber 

Stephen Broughman 



Third International Mathematics and Science Study (TIMSS) 

2001-01 Cross-National Variation in Educational Preparation for Adulthood: From Early 
Adolescence to Young Adulthood 

2001-05 Using TIMSS to Analyze Correlates of Performance Variation in Mathematics 
2001-07 A Comparison of the National Assessment of Educational Progress (NAEP), the Third 

International Mathematics and Science Study Repeat (TIMSS-R), and the Programme 
for International Student Assessment (PISA) 



Elvira Hausken 

Patrick Gonzales 
Arnold Goldstein 



Listing of NCES Working Papers by Subject 



No. Title 



Achievement (student) - mathematics 

2001-05 Using TIMSS to Analyze Correlates of Performance Variation in Mathematics 



Adult education 

96-14 The 1995 National Household Education Survey: Reinterview Results for the Adult 
Education Component 

96-20 1991 National Household Education Survey (NHES:91) Questionnaires: Screener, Early 

Childhood Education, and Adult Education 

96-22 1995 National Household Education Survey (NHES:95) Questionnaires: Screener, Early 

Childhood Program Participation, and Adult Education 
98-03 Adult Education in the 1990s: A Report on the 1991 National Household Education 
Survey 

98-10 Adult Education Participation Decisions and Barriers: Review of Conceptual Frameworks 
and Empirical Studies 

1 999- 1 1 Data Sources on Lifelong Learning Available from the National Center for Education 

Statistics 

2000- 1 6a Lifelong Learning NCES Task Force: Final Report Volume I 

2000-1 6b Lifelong Learning NCES Task Force: Final Report Volume II 



Adult literacy — see Literacy of adults 



American Indian - education 

1 999-13 1 993-94 Schools and Staffing Survey: Data File User’s Manual, Volume IV: Bureau of 

Indian Affairs (BIA) Restricted-Use Codebook 



Assessment/achievement 
95-12 Rural Education Data User’s Guide 

95-13 Assessing Students with Disabilities and Limited English Proficiency 

97-29 Can State Assessment Data be Used to Reduce State NAEP Sample Sizes? 

97-30 ACT’s NAEP Redesign Project: Assessment Design is the Key to Useful and Stable 
Assessment Results 

97-3 1 NAEP Reconfigured: An Integrated Redesign of the National Assessment of Educational 
Progress 

97-32 Innovative Solutions to Intractable Large Scale Assessment (Problem 2: Background 
Questions) 

97-37 Optimal Rating Procedures and Methodology for NAEP Open-ended Items 

97- 44 Development of a SASS 1 993-94 School-Level Student Achievement Subfile: Using 

State Assessments and State NAEP, Feasibility Study 

98- 09 High School Curriculum Structure: Effects on Coursetaking and Achievement in 

Mathematics for High School Graduates — An Examination of Data from the National 
Education Longitudinal Study of 1988 

2001-07 A Comparison of the National Assessment of Educational Progress (NAEP), the Third 
International Mathematics and Science Study Repeat (TIMSS-R), and the Programme 
for International Student Assessment (PISA) 

2001-1 1 Impact of Selected Background Variables on Students’ NAEP Math Performance 
2001-13 The Effects of Accommodations on the Assessment of LEP Students in NAEP 



Beginning students in postsecondary education 

98-1 1 Beginning Postsecondary Students Longitudinal Study First Follow-up (BPS:96-98) Field 
Test Report 

2001-04 Beginning Postsecondary Students Longitudinal Study: 1996-2001 (BPS: 1996/2001) 

Field Test Methodology Report 



NCES contact 

Patrick Gonzales 

Steven Kaufman 

Kathryn Chandler 

Kathryn Chandler 

Peter Stowe 

Peter Stowe 

Lisa Hudson 

Lisa Hudson 
Lisa Hudson 

Kerry Gruber 

Samuel Peng 
James Houser 
Larry Ogle 
Larry Ogle 

Larry Ogle 

Larry Ogle 

Larry Ogle 
Michael Ross 

Jeffrey Owings 

Arnold Goldstein 

Arnold Goldstein 
Arnold Goldstein 

Aurora D’Amico 
Paula Knepper 
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No. 



Title 



NCES contact 



Civic participation 

97- 25 1 996 National Household Education Survey (NHES:96) Questionnaires: 

Screener/Household and Library, Parent and Family Involvement in Education and 
Civic Involvement, Youth Civic Involvement, and Adult Civic Involvement 

Climate of schools 

95-14 Empirical Evaluation of Social, Psychological, & Educational Construct Variables Used 
in NCES Surveys 

Cost of education indices 

94- 05 Cost-of-Education Differentials Across the States 

Course-taking 

95- 12 Rural Education Data User’s Guide 

98- 09 High School Curriculum Structure: Effects on Coursetaking and Achievement in 

Mathematics for High School Graduates — An Examination of Data from the National 
Education Longitudinal Study of 1988 
1 999-05 Procedures Guide for Transcript Studies 
1 999-06 1 998 Revision of the Secondary School Taxonomy 

Crime 

97- 09 Status of Data on Crime and Violence in Schools: Final Report 

Curriculum 

95-1 1 Measuring Instruction, Curriculum Content, and Instructional Resources: The Status of 
Recent Work 

98- 09 High School Curriculum Structure: Effects on Coursetaking and Achievement in 

Mathematics for High School Graduates — An Examination of Data from the National 
Education Longitudinal Study of 1988 

Customer service 

1 999- 10 What Users Say About Schools and Staffing Survey Publications 

2000- 02 Coordinating NCES Surveys: Options, Issues, Challenges, and Next Steps 

2000- 04 Selected Papers on Education Surveys: Papers Presented at the 1998 and 1999 ASA and 

1999 AAPOR Meetings 

2001- 12 Customer Feedback on the 1990 Census Mapping Project 

Data quality 

97-13 Improving Data Quality in NCES: Database-to-Report Process 
2001-1 1 Impact of Selected Background Variables on Students’ NAEP Math Performance 
2001-13 The Effects of Accommodations on the Assessment of LEP Students in NAEP 

Data warehouse 

2000-04 Selected Papers on Education Surveys: Papers Presented at the 1998 and 1999 ASA and 
1999 AAPOR Meetings 



Design effects 

2000-03 Strengths and Limitations of Using SUDAAN, Stata, and WesVarPC for Computing 
Variances from NCES Data Sets 

Dropout rates, high school 

95- 07 National Education Longitudinal Study of 1988: Conducting Trend Analyses HS&B and 

NELS:88 Sophomore Cohort Dropouts 

Early childhood education 

96- 20 1991 National Household Education Survey (NHES:91) Questionnaires: Screener, Early 

Childhood Education, and Adult Education 



Kathryn Chandler 



Samuel Peng 



William J. Fowler, Jr. 



Samuel Peng 
Jeffrey Owing s 



Dawn Nelson 
Dawn Nelson 



Lee Hoffman 



Sharon Bobbitt & 
John Ralph 
Jeffrey Owings 



Dan Kasprzyk 
Valena Plisko 
Dan Kasprzyk 

Dan Kasprzyk 



Susan Ahmed 
Arnold Goldstein 
Arnold Goldstein 



Dan Kasprzyk 



Ralph Lee 



Jeffrey Owings 



Kathryn Chandler 
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No. Title 

96- 22 1995 National Household Education Survey (NHES:95) Questionnaires: Screener, Early 

Childhood Program Participation, and Adult Education 

97- 24 Formulating a Design for the ECLS: A Review of Longitudinal Studies 

97- 36 Measuring the Quality of Program Environments in Head Start and Other Early Childhood 

Programs: A Review and Recommendations for Future Research 
1 999-01 A Birth Cohort Study: Conceptual and Design Considerations and Rationale 
2001-02 Measuring Father Involvement in Young Children's Lives: Recommendations for a 
Fatherhood Module for the ECLS-B 

2001-03 Measures of Socio-Emotional Development in Middle School 

2001-06 Papers from the Early Childhood Longitudinal Studies Program: Presented at the 2001 
AERA and SRCD Meetings 

Educational attainment 

98- 1 1 Beginning Postsecondary Students Longitudinal Study First Follow-up (BPS:96-98) Field 

Test Report 

2001-15 Baccalaureate and Beyond Longitudinal Study: 2000/01 Follow-Up Field Test 
Methodology Report 



Educational research 

2000- 02 Coordinating NCES Surveys: Options, Issues, Challenges, and Next Steps 

Eighth-graders 

2001- 05 Using TIMSS to Analyze Correlates of Performance Variation in Mathematics 

Employment 

96-03 National Education Longitudinal Study of 1988 (NELS:88) Research Framework and 
Issues 

98-1 1 Beginning Postsecondary Students Longitudinal Study First Follow-up (BPS:96-98) Field 
Test Report 

2000- 16a Lifelong Learning NCES Task Force: Final Report Volume I 

2000- 1 6b Lifelong Learning NCES Task Force: Final Report Volume II 

2001- 01 Cross-National Variation in Educational Preparation for Adulthood: From Early 

Adolescence to Young Adulthood 

Employment - after college 

2001-15 Baccalaureate and Beyond Longitudinal Study: 2000/01 Follow-Up Field Test 
Methodology Report 



Engineering 

2000- 1 1 Financial Aid Profile of Graduate Students in Science and Engineering 

Enrollment - after college 

2001- 15 Baccalaureate and Beyond Longitudinal Study: 2000/01 Follow-Up Field Test 

Methodology Report 

Faculty - higher education 

97- 26 Strategies for Improving Accuracy of Postsecondary Faculty Lists 

2000- 01 1999 National Study of Postsecondary Faculty (NSOPF:99) Field Test Report 

Fathers - role in education 

2001- 02 Measuring Father Involvement in Young Children’s Lives: Recommendations for a 

Fatherhood Module for the ECLS-B 

Finance - elementary and secondary schools 

94-05 Cost-of-Education Differentials Across the States 
96-1 9 Assessment and Analysis of School-Level Expenditures 

98- 01 Collection of Public School Expenditure Data: Development of a Questionnaire 

1 999-07 Collection of Resource and Expenditure Data on the Schools and Staffing Survey 



NCES contact 
Kathryn Chandler 

Jerry West 
Jerry West 

Jerry West 
Jerry West 

Elvira Hausken 
Jerry West 



Aurora D’Amico 
Andrew G. Malizio 



Valena Plisko 



Patrick Gonzales 



Jeffrey Owings 

Aurora D’Amico 

Lisa Hudson 
Lisa Hudson 
Elvira Hausken 



Andrew G. Malizio 



Aurora D’Amico 



Andrew G. Malizio 



Linda Zimbler 
Linda Zimbler 



Jerry West 



William J. Fowler, Jr. 
William J. Fowler, Jr. 
Stephen Broughman 
Stephen Broughman 
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No. Title 

1 999- 1 6 Measuring Resources in Education: From Accounting to the Resource Cost Model 

Approach 

2000- 18 Feasibility Report: School-Level Finance Pretest, Public School District Questionnaire 

2001- 14 Evaluation of the Common Core of Data (CCD) Finance Data Imputations 



Finance - postsecondary 

97-27 Pilot Test of IPEDS Finance Survey 

2000-14 IPEDS Finance Data Comparisons Under the 1997 Financial Accounting Standards for 
Private, Not-for-Profit Institutes: A Concept Paper 



Finance - 

95- 17 

96- 16 

97- 07 

97-22 

1999- 07 

2000- 15 



private schools 

Estimates of Expenditures for Private K-12 Schools 
Strategies for Collecting Finance Data from Private Schools 

The Determinants of Per-Pupil Expenditures in Private Elementary and Secondary 
Schools: An Exploratory Analysis 

Collection of Private School Finance Data: Development of a Questionnaire 
Collection of Resource and Expenditure Data on the Schools and Staffing Survey 
Feasibility Report: School-Level Finance Pretest, Private School Questionnaire 



Geography 

98-04 Geographic Variations in Public Schools’ Costs 



Graduate students 

2000-1 1 Financial Aid Profile of Graduate Students in Science and Engineering 



Graduates of postsecondary education 

2001-1 5 Baccalaureate and Beyond Longitudinal Study: 2000/0 1 Follow-Up Field Test 
Methodology Report 



Imputation 

2000- 04 

2001 - 10 
2001-14 
2001-16 
2001-17 



Selected Papers on Education Surveys: Papers Presented at the 1998 and 1999 ASA and 
1999 AAPOR Meeting 

Comparison of Proc Impute and Schafer’s Multiple Imputation Software 
Evaluation of the Common Core of Data (CCD) Finance Data Imputations 
Imputation of Test Scores in the National Education Longitudinal Study of 1988 
A Study of Imputation Algorithms 



Inflation 

97-43 Measuring Inflation in Public School Costs 



Institution data 

2000-01 1999 National Study of Postsecondary Faculty (NSOPF:99) Field Test Report 



Instructional resources and practices 

95-1 1 Measuring Instruction, Curriculum Content, and Instructional Resources: The Status of 
Recent Work 

1999-08 Measuring Classroom Instructional Processes: Using Survey and Case Study Field Test 
Results to Improve Item Construction 



International comparisons 

97-1 1 International Comparisons of Inservice Professional Development 
97-16 International Education Expenditure Comparability Study: Final Report, Volume I 

97-17 International Education Expenditure Comparability Study: Final Report, Volume II, 

Quantitative Analysis of Expenditure Comparability 
2001-01 Cross-National Variation in Educational Preparation for Adulthood: From Early 
Adolescence to Young Adulthood 

2001-07 A Comparison of the National Assessment of Educational Progress (NAEP), the Third 
International Mathematics and Science Study Repeat (TIMSS-R), and the Programme 
for International Student Assessment (PISA) 



NCES contact 
William J. Fowler, Jr. 

Stephen Brough man 
Frank Johnson 



Peter Stowe 
Peter Stowe 



Stephen Broughman 
Stephen Broughman 
Stephen Broughman 

Stephen Broughman 
Stephen Broughman 
Stephen Broughman 



William J. Fowler, Jr. 



Aurora D’Amico 



Andrew G. Malizio 



Dan Kasprzyk 

Sam Peng 
Frank Johnson 
Ralph Lee 
Ralph Lee 



William J. Fowler, Jr. 



Linda Zimbler 



Sharon Bobbitt & 
John Ralph 
Dan Kasprzyk 



Dan Kasprzyk 
Shelley Bums 
Shelley Bums 

Elvira Hausken 

Arnold Goldstein 
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No. 



Title 



NCES contact 



International comparisons - math and science achievement 

2001-05 Using TIMSS to Analyze Correlates of Performance Variation in Mathematics 

Libraries 

94- 07 Data Comparability and Public Policy: New Interest in Public Library Data Papers 

Presented at Meetings of the American Statistical Association 
97-25 1996 National Household Education Survey (NHES:96) Questionnaires: 

Screener/Household and Library, Parent and Family Involvement in Education and 
Civic Involvement, Youth Civic Involvement, and Adult Civic Involvement 

Limited English Proficiency 

95- 13 Assessing Students with Disabilities and Limited English Proficiency 

2001-1 1 Impact of Selected Background Variables on Students’ NAEP Math Performance 
2001-13 The Effects of Accommodations on the Assessment of LEP Students in NAEP 



Literacy of adults 

98-1 7 Developing the National Assessment of Adult Literacy: Recommendations from 

Stakeholders 

1 999-09a 1 992 National Adult Literacy Survey: An Overview 

1 999-09b 1992 National Adult Literacy Survey: Sample Design 

1999-09c 1992 National Adult Literacy Survey: Weighting and Population Estimates 

1 999-09d 1 992 National Adult Literacy Survey: Development of the Survey Instruments 

1 999-09e 1992 National Adult Literacy Survey: Scaling and Proficiency Estimates 

1 999-09f 1 992 National Adult Literacy Survey: Interpreting the Adult Literacy Scales and Literacy 

Levels 

1 999-09g 1 992 National Adult Literacy Survey: Literacy Levels and the Response Probability 

Convention 

1 999- 1 1 Data Sources on Lifelong Learning Available from the National Center for Education 

Statistics 

2000- 05 Secondary Statistical Modeling With the National Assessment of Adult Literacy: 

Implications for the Design of the Background Questionnaire 

2000-06 Using Telephone and Mail Surveys as a Supplement or Alternative to Door-to-Door 
Surveys in the Assessment of Adult Literacy 

2000-07 “How Much Literacy is Enough?” Issues in Defining and Reporting Performance 
Standards for the National Assessment of Adult Literacy 

2000-08 Evaluation of the 1 992 NALS Background Survey Questionnaire: An Analysis of Uses 
with Recommendations for Revisions 

2000- 09 Demographic Changes and Literacy Development in a Decade 

2001- 08 Assessing the Lexile Framework: Results of a Panel Meeting 



Literacy of adults - international 

97- 33 Adult Literacy: An International Perspective 

Mathematics 

98- 09 High School Curriculum Structure: Effects on Coursetaking and Achievement in 

Mathematics for High School Graduates — An Examination of Data from the National 
Education Longitudinal Study of 1988 

1999-08 Measuring Classroom Instructional Processes: Using Survey and Case Study Field Test 
Results to Improve Item Construction 

2001-05 Using TIMSS to Analyze Correlates of Performance Variation in Mathematics 

2001-07 A Comparison of the National Assessment of Educational Progress (NAEP), the Third 
International Mathematics and Science Study Repeat (TIMSS-R), and the Programme 
for International Student Assessment (PISA) 

2001-11 Impact of Selected Background Variables on Students’ NAEP Math Performance 



Parental involvement in education 

96-03 National Education Longitudinal Study of 1 988 (NELS:88) Research Framework and 
Issues 
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Patrick Gonzales 

Carrol Kindel 
Kathryn Chandler 

James Houser 
Arnold Goldstein 
Arnold Goldstein 

Sheida White 

Alex Sedlacek 
Alex Sedlacek 
Alex Sedlacek 
Alex Sedlacek 
Alex Sedlacek 
Alex Sedlacek 

Alex Sedlacek 

Lisa Hudson 

Sheida White 

Sheida White 

Sheida White 

Sheida White 

Sheida White 
Sheida White 

Marilyn Binkley 

Jeffrey Owings 

Dan Kasprzyk 

Patrick Gonzales 
Arnold Goldstein 

Arnold Goldstein 

Jeffrey Owings 



No. Title 

97- 25 1 996 National Household Education Survey (NHES:96) Questionnaires: 

Screener/Household and Library, Parent and Family Involvement in Education and 
Civic Involvement, Youth Civic Involvement, and Adult Civic Involvement 
1 999-01 A Birth Cohort Study: Conceptual and Design Considerations and Rationale 
2001-06 Papers from the Early'Childhood Longitudinal Studies Program: Presented at the 2001 
AERA and SRCD Meetings 

Participation rates 

98- 1 0 Adult Education Participation Decisions and Barriers: Review of Conceptual Frameworks 

and Empirical Studies 

Postsecondary education 

1 999- 1 1 Data Sources on Lifelong Learning Available from the National Center for Education 

Statistics 

2000- 1 6a Lifelong Learning NCES Task Force: Final Report Volume I 

2000-1 6b Lifelong Learning NCES Task Force: Final Report Volume II 

Postsecondary education - persistence and attainment 

98-1 1 Beginning Postsecondary Students Longitudinal Study First Follow-up (BPS:96-98) Field 
Test Report 

1 999- 1 5 Projected Postsecondary Outcomes of 1 992 High School Graduates 

Postsecondary education - staff 

97-26 Strategies for Improving Accuracy of Postsecondary Faculty Lists 

2000- 01 1999 National Study of Postsecondary Faculty (NSOPF:99) Field Test Report 

Principals 

2000-1 0 A Research Agenda for the 1 999-2000 Schools and Staffing Survey 

Private schools 

96- 16 Strategies for Collecting Finance Data from Private Schools 

97- 07 The Determinants of Per-Pupil Expenditures in Private Elementary and Secondary 

Schools: An Exploratory Analysis 

97- 22 Collection of Private School Finance Data: Development of a Questionnaire 

2000-1 3 Non-professional Staff in the Schools and Staffing Survey (SASS) and Common Core of 
Data (CCD) 

2000-15 Feasibility Report: School-Level Finance Pretest, Private School Questionnaire 

Projections of education statistics 

1 999-1 5 Projected Postsecondary Outcomes of 1 992 High School Graduates 

Public school finance 

1 999- 1 6 Measuring Resources in Education: From Accounting to the Resource Cost Model 

Approach 

2000- 1 8 Feasibility Report: School-Level Finance Pretest, Public School District Questionnaire 

Public schools 

97^13 Measuring Inflation in Public School Costs 

98- 01 Collection of Public School Expenditure Data: Development of a Questionnaire 

98-04 Geographic Variations in Public Schools’ Costs 

1 999- 02 Tracking Secondary Use of the Schools and Staffing Survey Data: Preliminary Results 

2000- 12 Coverage Evaluation of the 1994-95 Public Elementary/Secondary School Universe 

Survey 

2000-1 3 Non-professional Staff in the Schools and Staffing Survey (SASS) and Common Core of 
Data (CCD) 
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Kathryn Chandler 

Jerry West 
Jerry West 



Peter Stowe 



Lisa Hudson 

Lisa Hudson 
Lisa Hudson 



Aurora D’Amico 
Aurora D’Amico 



Linda Zimbler 
Linda Zimbler 



Dan Kasprzyk 



Stephen Broughman 
Stephen Broughman 

Stephen Broughman 
Kerry Gruber 

Stephen Broughman 



Aurora D’Amico 



William J. Fowler, Jr. 
Stephen Broughman 



William J. Fowler, Jr. 
Stephen Broughman 
William J. Fowler, Jr. 
Dan Kasprzyk 
Beth Young 

Kerry Gruber 
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No. 



Title 



NCES contact 



Public schools - secondary 

98-09 High School Curriculum Structure: Effects on Coursetaking and Achievement in 

Mathematics for High School Graduates — An Examination of Data from the National 
Education Longitudinal Study of 1988 



Reform, educational 

96-03 National Education Longitudinal Study of 1988 (NELS:88) Research Framework and 
Issues 



Response rates 

98-02 Response Variance in the 1993-94 Schools and Staffing Survey: A Reinterview Report 

School districts 

2000-1 0 A Research Agenda for the 1999-2000 Schools and Staffing Survey 

School districts, public 

98-07 Decennial Census School District Project Planning Report 

1 999-03 Evaluation of the 1 996-97 Nonfiscal Common Core of Data Surveys Data Collection, 
Processing, and Editing Cycle 

School districts, public - demographics of 

96- 04 Census Mapping Project/School District Data Book 

Schools 

97- 42 Improving the Measurement of Staffing Resources at the School Level: The Development 

of Recommendations for NCES for the Schools and Staffing Survey (SASS) 

98- 08 The Redesign of the Schools and Staffing Survey for 1999-2000: A Position Paper 

1999- 03 Evaluation of the 1996-97 Nonfiscal Common Core of Data Surveys Data Collection, 

Processing, and Editing Cycle 

2000- 1 0 A Research Agenda for the 1999-2000 Schools and Staffing Survey 

Schools - safety and discipline 

97-09 Status of Data on Crime and Violence in Schools: Final Report 
Science 

2000- 1 1 Financial Aid Profile of Graduate Students in Science and Engineering 

2001- 07 A Comparison of the National Assessment of Educational Progress (NAEP), the Third 

International Mathematics and Science Study Repeat (TIMSS-R), and the Programme 
for International Student Assessment (PISA) 

Software evaluation 

2000-03 Strengths and Limitations of Using SUDAAN, Stata, and WesVarPC for Computing 

Variances from NCES Data Sets 

Staff 

97- 42 Improving the Measurement of Staffing Resources at the School Level: The Development 

of Recommendations for NCES for the Schools and Staffing Survey (SASS) 

98- 08 The Redesign of the Schools and Staffing Survey for 1 999-2000: A Position Paper 

Staff - higher education institutions 

97-26 Strategies for Improving Accuracy of Postsecondary Faculty Lists 

Staff - nonprofessional 

2000-1 3 Non-professional Staff in the Schools and Staffing Survey (SASS) and Common Core of 
Data (CCD) 



Jeffrey Owings 

Jeffrey Owings 

Steven Kaufman 

Dan Kasprzyk 

Tai Phan 
Beth Young 

Tai Phan 

Mary Rollefson 

Dan Kasprzyk 
Beth Young 

Dan Kasprzyk 

Lee Hoffman 

Aurora D’Amico 
Arnold Goldstein 

Ralph Lee 

Mary Rollefson 
Dan Kasprzyk 

Linda Zimbler 
Kerry Gruber 
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No. 



Title 



NCES contact 



State 

1 999-03 Evaluation of the 1 996-97 Nonfiscal Common Core of Data Surveys Data Collection, 
Processing, and Editing Cycle 

Statistical methodology 

97-21 Statistics for Policymakers or Everything You Wanted to Know About Statistics But 
Thought You Could Never Understand 

Statistical standards and methodology 

2001-05 Using TIMSS to Analyze Correlates of Performance Variation in Mathematics 

Students with disabilities 

95- 13 Assessing Students with Disabilities and Limited English Proficiency 

2001-1 3 The Effects of Accommodations on the Assessment of LEP Students in NAEP 

Survey methodology 

96- 1 7 National Postsecondary Student Aid Study: 1 996 Field Test Methodology Report 

97- 1 5 Customer Service Survey: Common Core of Data Coordinators 

97- 35 Design, Data Collection, Interview Administration Time, and Data Editing in the 1996 

National Household Education Survey 

98- 06 National Education Longitudinal Study of 1 988 (NELS:88) Base Year through Second 

Follow-Up: Final Methodology Report 

98-1 1 Beginning Postsecondary Students Longitudinal Study First Follow-up (BPS:96-98) Field 
Test Report 

98-1 6 A Feasibility Study of Longitudinal Design for Schools and Staffing Survey 
1 999-07 Collection of Resource and Expenditure Data on the Schools and Staffing Survey 

1 999- 1 7 Secondary Use of the Schools and Staffing Survey Data 

2000- 01 1 999 National Study of Postsecondary Faculty (NSOPF:99) Field Test Report 

2000-02 Coordinating NCES Surveys: Options, Issues, Challenges, and Next Steps 

2000-04 Selected Papers on Education Surveys: Papers Presented at the 1 998 and 1 999 ASA and 

1 999 AAPOR Meetings 

2000-12 Coverage Evaluation of the 1994-95 Public Elementary/Secondary School Universe 
Survey 

2000- 17 National Postsecondary Student Aid Study:2000 Field Test Methodology Report 

2001- 04 Beginning Postsecondary Students Longitudinal Study: 1996-2001 (BPS: 1996/2001) 

Field Test Methodology Report 

2001-07 A Comparison of the National Assessment of Educational Progress (NAEP), the Third 

International Mathematics and Science Study Repeat (TIMSS-R), and the Programme 
for International Student Assessment (PISA) 

2001-09 An Assessment of the Accuracy of CCD Data: A Comparison of 1988, 1989, and 1990 
CCD Data with 1990-91 SASS Data 

2001-1 1 Impact of Selected Background Variables on Students’ NAEP Math Performance 
2001-13 The Effects of Accommodations on the Assessment of LEP Students in NAEP 

Teachers 

98-1 3 Response Variance in the 1994-95 Teacher Follow-up Survey 

1 999- 1 4 1 994-95 Teacher Followup Survey: Data File User’s Manual, Restricted-Use Codebook 

2000- 1 0 A Research Agenda for the 1 999-2000 Schools and Staffing Survey 

Teachers - instructional practices of 

98-08 The Redesign of the Schools and Staffing Survey for 1 999-2000: A Position Paper 

Teachers - opinions regarding safety 

98-08 The Redesign of the Schools and Staffing Survey for 1 999-2000: A Position Paper 

Teachers - performance evaluations 

1 999-04 Measuring T eacher Qualifications 




Beth Young 



Susan Ahmed 



Patrick Gonzales 



James Houser 
Arnold Goldstein 



Andrew G. Malizio 
Lee Hoffman 
Kathryn Chandler 

Ralph Lee 

Aurora D’Amico 

Stephen Broughman 
Stephen Broughman 
Susan Wiley 
Linda Zimbler 
Valena Plisko 
Dan Kasprzyk 

Beth Young 

Andrew G. Malizio 
Paula Knepper 

Arnold Goldstein 



John Sietsema 

Arnold Goldstein 
Arnold Goldstein 



Steven Kaufman 
Kerry Gruber 
Dan Kasprzyk 



Dan Kasprzyk 



Dan Kasprzyk 



Dan Kasprzyk 
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No. 



Title 



NCES contact 



Teachers - qualifications of 

1999-04 Measuring Teacher Qualifications Dan Kasprzyk 



Teachers - salaries of 

94-05 Cost-of-Education Differentials Across the States 



William J. Fowler, Jr. 



Training 

2000-1 6a Lifelong Learning NCES Task Force: Final Report Volume I 

2000-1 6b Lifelong Learning NCES Task Force: Final Report Volume II 



Lisa Hudson 
Lisa Hudson 



Variance estimation 

2000-03 Strengths and Limitations of Using SUDAAN, Stata, and WesVarPC for Computing Ralph Lee 

Variances ffom NCES Data Sets 

2000-04 Selected Papers on Education Surveys: Papers Presented at the 1998 and 1 999 ASA and Dan Kasprzyk 
1999 AAPOR Meetings 



Violence 

97-09 Status of Data on Crime and Violence in Schools: Final Report 



Lee Hoffman 



Vocational education 

95-1 2 Rural Education Data User’s Guide 
1 999-05 Procedures Guide for Transcript Studies 
1 999-06 1 998 Revision of the Secondary School Taxonomy 



Samuel Peng 
Dawn Nelson 
Dawn Nelson 
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